Hi
JSON is a must have when dealing with servers or high-level languages.
I recently had to try my luck with Python, which has a large number of libraries, especially one for JSON.
To exchange data from my applications, I started writing my own JSON read/write routines.
There are many libraries out there, but I wanted to integrate the functionality seamlessly into my code.
My first attempt is a simple write routine (stringify) that is encapsulated in an object.
When I have a little more time, I will complete the read (parse) routine.
It's a work in progress. :biggrin:
If anyone wants to try it, the source is attached.
Regards, Biterider
Hi
Today I finished the JSON parser, which strictly follows the guidelines of https://www.json.org/ (https://www.json.org/).
The ANSI compilation target was a little tricky as the native JSON format is UTF8. The solution to this problem is to use wide strings internally, which are translated into the current CodePage at the end.
The WIDE compilation target was easier to implement thanks to the UTF8ToWide function.
I successfully tested the code on several JSON files that I found on the internet.
A note on numeric and boolean values: both are recognized when reading, but not translated into a value.
It is up to the host to interpret the string value and convert it to the desired format, e.g. BYTE, WORD, DWORD, QWORD, REAL4, REAL8 etc.
It remains to test the performance for the next few days, but I am confident that this is very acceptable. :biggrin:
The source is attached to the first post.
Regards, Biterider
Hi
Finally, I finished JSON support and added 3 routines for encoding and decoding JSON escape sequences.
JsonEscDecode: converts a string containing escape sequences into a plain wide string
JsonEscEncode: converts a wide string containing special characters into a JSON string
JsonEscEncodeSize: calculates the required byte size to allocate a JSON string.
Note: these routines are very similar to the percent encoding of URL strings https://en.wikipedia.org/wiki/Percent-encoding (https://en.wikipedia.org/wiki/Percent-encoding) :icon_idea:
Biterider
Hi
I would like to post a link to a very handy tool written in asm by fearless to display the JSON information in the form of a tree structure. :thumbsup:
It has already been mentioned a few times here in the forum, the direct link is:
https://github.com/mrfearless/cjsontree (https://github.com/mrfearless/cjsontree)
Biterider
Thanks Biterider,
Hopefully its useful to someone. It was similar to the xmltree thing that is on the forums somewhere, but more developed. Originally more as a proof of concept for using the cJSON library (https://github.com/DaveGamble/cJSON (https://github.com/DaveGamble/cJSON)) but it grew over time with iterations to fix issues and to add in a few more features. Had plans to add some more stuff but my interest only carried it so far. I found using json preferable to xml as it seems less crowded, and doesn't use the tags syntax of xml (which can add to the size of the file/data).
https://www.guru99.com/json-vs-xml-difference.html
JSON sample
{
"student": [
{
"id":"01",
"name": "Tom",
"lastname": "Price"
},
{
"id":"02",
"name": "Nick",
"lastname": "Thameson"
}
]
}
XML sample
<?xml version="1.0" encoding="UTF-8" ?>
<root>
<student>
<id>01</id>
<name>Tom</name>
<lastname>Price</lastname>
</student>
<student>
<id>02</id>
<name>Nick</name>
<lastname>Thameson</lastname>
</student>
</root>
I am more used to working with xml, I did not know json, but it does not seem that there is much difference between one and the other. What I find interesting is that there seem to be nosql databases that use this structure to store data, which I have never seen and I don't find its ussefulness over sql db.
QuoteMongoDB is a document database, which means it stores data in JSON-like documents. We believe this is the most natural way to think about data, and is much more expressive and powerful than the traditional row/column model.
https://www.mongodb.com/
QuoteWhen people use the term "NoSQL database", they typically use it to refer to any non-relational database. Some say the term "NoSQL" stands for "non SQL" while others say it stands for "not only SQL." Either way, most agree that NoSQL databases are databases that store data in a format other than relational tables.
A common misconception is that NoSQL databases or non-relational databases don't store relationship data well. NoSQL databases can store relationship data—they just store it differently than relational databases do. In fact, when compared with SQL databases, many find modeling relationship data in NoSQL databases to be easier than in SQL databases, because related data doesn't have to be split between tables.
https://www.mongodb.com/nosql-explained
Ah, BTW, thank you for the code :thumbsup:
Hi Biterider!
There is a little problem in encoding translation with some unusual codes (mostly accidentals :biggrin:).
SysSetup OOP, WIN32, ANSI_STRING
OCall [xsi].DskStreamIn::DiskStream.Init, NULL, $OfsCStr("data0.json"), GENERIC_READ, \
0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0
OCall xsi.Read, addr [xsi].DskStreamIn
OCall [xsi].DskStreamOut::DiskStream.Init, NULL, $OfsCStr("data1.json"), GENERIC_WRITE, \
0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0
New MemoryStream
mov pBufferM, xax
OCall pBufferM::MemoryStream.Init, xsi, 5*1024, 5*1024, -1
OCall xsi.Write, pBufferM
OCall xsi.Write, addr [xsi].DskStreamOut
Apparently there is no problem in pBuffer (that is showed in a RichEdit), but some codes are changed between data0 and data1:
E28093 -> 96
C2A9 -> A9
C2AE -> AE
That look correct because is using codepage 1252 as default, but JSON routines fail to create the tree from data1 with this changes.
Manually replacing first new code with 2Dh (minus sign), and erasing others two, work perfectly.
Thanks, HSE.
Hi HSE
I'll look into it. Can you send me the input file (data0.json)?
Regards, Biterider
data0 is UTF-8 and data1 is ANSI
Hi
I found the problem. Contrary to my statement in the JSON description "This implementation uses WIDE strings internally", when compiling for ANSI strings, some conversions happen and the result is saved as ANSI. For this reason, when writing back the JSON file, the result is ANSI. If you compile the application for WIDE strings, everything works fine as it should.
The solution to this situation is to really convert all to WIDE strings, with the consequence that, for example, the search for a specific key must be done with a WIDE string, regardless of the TARGET_STR_TYPE setting.
Before changing this, I would like to discuss it first. :rolleyes:
Biterider
I would avoid ANSI. You can perfectly convert Utf8 to Utf16 and back, but Ansi<->Utf16 is a different story. There is a reason why the Internet runs on Utf8.
Quote from: Biterider on December 10, 2022, 08:35:45 PMContrary to my statement in the JSON description "This implementation uses WIDE strings internally"
Also you state "The default encoding is UTF-8.". Then, ideally, something loaded as UTF-8 must be saved as UTF-8. (unless some explicit option). But that is just a feature.
Quote from: Biterider on December 10, 2022, 08:35:45 PM
, when compiling for ANSI strings, some conversions happen and the result is saved as ANSI. For this reason, when writing back the JSON file, the result is ANSI.
If I was not making something wrong :biggrin:, then, from programming point of view,
here is the problem: JSON routines save in
extended ANSI encoding, but only can read
reduced ANSI encoding.
Quote from: Biterider on December 10, 2022, 08:35:45 PM
Before changing this, I would like to discuss it first. :rolleyes:
For what I want, to make JSON routines load correctly extended ANSI encoding could be more than enough :thumbsup:
I don't know if worth the effort to change things to allow a WIDE application can work with ANSI process and files. If is for free, that could be nice :biggrin:
Perhaps could be this more general Json.inc and a JsonECMA.inc
For example in Android Java, JSON stream "must be" UTF-8. But not limitations in file encoding, just you have to make an independent conversion first:
File file = new File("My.json");
FileInputStream inputStream = new FileInputStream(file);
int size = inputStream.available();
byte[] buffer = new byte[size];
inputStream.read(buffer);
inputStream.close();
json = new String(buffer, "ISO_8859_1"); << conversion ANSI to UTF-8
MyJson = new JSONObject(json);
Hi
Looking at some sources of information I stumbled across the RFC JSON specification.
Specifically, chapter 8.1 (https://www.rfc-editor.org/rfc/rfc8259#section-8.1) defines:
QuoteJSON text exchanged between systems that are not part of a closed ecosystem MUST be UTF-8 encoded
I read it like this:
If you want to achieve interoperability between applications, you must use UTF8.
When you do your own thing you can do whatever you prefer...
Biterider
Hi
Reading further
QuoteImplementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a networked-transmitted JSON text. In the interests of
interoperability, implementations that parse JSON texts MAY ignore
the presence of a byte order mark rather than treating it as an
error.
This is something that needs to be added to the current implementation :icon_idea:
What puzzles me is that U+FEFF ist the BOM for UTF-16. I would expect the 3 byte BOM EF;BB;BF for UTF8... :rolleyes:
Later: I got it. The 3 byte BOM is the UTF8 encoding of U+FEFF :biggrin:
Biterider
Perfect :thumbsup:
And I like the definition: I have a "closed ecosystem" :biggrin: :biggrin:
Quote from: Biterider on December 11, 2022, 07:33:23 PM
Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a networked-transmitted JSON text. In the interests of
interoperability, implementations that parse JSON texts MAY ignore
the presence of a byte order mark rather than treating it as an
error.
A silly rule. Every parser can find the BOM in a few microseconds, set its encoding, and then ignore the two or three bytes. It is much, much more difficult to determine if a text is Ansi (which codepage?), Utf8 or Utf16.
Practically all webpages have a
<meta charset="utf-8"> on top. That is a 22-byte "BOM".
QuoteLater: I got it. The 3 byte BOM is the UTF8 encoding of U+FEFF :biggrin:
Nice find :tongue:
Quote from: HSE on December 10, 2022, 05:42:31 AM
data0 is UTF-8 and data1 is ANSI
I found that my use of name "ANSI" is totally wrong!! (but almost everybody use that in same way :biggrin:)
The character encoding to wich I refer is a superset of ISO 8859-1 (wich is something like ASCII + Latin-1 Supplement) in terms of printable characters, but differs from it, and have additional characters. It is known to Windows by the code page number 1252, and by the IANA-approved name "windows-1252" (https://en.wikipedia.org/wiki/Windows-1252).
QuoteMicrosoft explains, "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."
Quite interesting :thumbsup:
Agreed. "Ansi" in its common usage is a family of character sets that comprises Windows-1252 but also the popular "DOS graphics" charset (https://en.wikipedia.org/wiki/Code_page_437#Character_set).
CP_UTF8 is a different animal.
Hi Biterider!
By assembling in ANSI mode, these few modifications allow you to read UTF-8 or ANSI, even if it is not very efficient for ANSI (and always will save ANSI).
JSON_FILE_UTF8 equ 0
JSON_FILE_ANSI equ 1
Method Json.Read, uses xbx xdi xsi, pStream:$ObjPtr(Stream), xTipo:XWORD
local lMemBlockUTF8:XWORD
· · ·
mov xbx, $OCall(pStream::Stream.GetSize)
.if eax != -1
mov lMemBlockUTF8, xax
· · ·
.if xTipo == JSON_FILE_UTF8
invoke UTF8ToWide, pMemBlockWide, pMemBlockUTF8, ebx
.else
invoke MultiByteToWideChar,CP_ACP,0,pMemBlockUTF8,lMemBlockUTF8,pMemBlockWide, ebx
.endif
· · ·
MethodEnd
Regards, HSE.
Hi HSE
Good idea. :thumbsup:
If you are thinking about compiling this thing for ANSI, there is a way to simplify it.
What about when all internal processing is done with ANSI? The only problem is reading an UFT8 stream, but there is no solution for that if the code point is beyond the ANSI range. :sad:
This means that compiling for ANSI falls into the "closed ecosystem" category. :biggrin:
Such a change can be easily implemented.
Biterider
Quote from: Biterider on December 14, 2022, 05:28:13 AMThe only problem is reading an UFT8 stream
Remember that streams are agnostic regarding codepages. For an assembler and for the Windows functions ending with -A, "Ansi" and Utf8 are the same thing. It is when you reconvert it to Utf16 that a MessageBoxW looks arabic or chinese.
Biterider,
Quote from: Biterider on December 14, 2022, 05:28:13 AM
The only problem is reading an UFT8 stream, but there is no solution for that if the code point is beyond the ANSI range. :sad:
No big problem. This days not reading so much russian, chinese or quenya :biggrin: :biggrin:
Quote from: Biterider on December 14, 2022, 05:28:13 AM
This means that compiling for ANSI falls into the "closed ecosystem" category. :biggrin:
What about when all internal processing is done with ANSI?
The idea behind no to simplify nothing is that assembling for WIDE "ecosystem open a little", you can read ANSI or UTF8, and always save UTF8 :biggrin:
Just I can't test because I have a nice crash when app is assembled for WIDE :sad: (some interface problem, I guess)
Later:
:thumbsup: Assembling for WIDE can read ANSI and save UTF8
HSE
Quote from: jj2007 on December 14, 2022, 06:05:53 AM
Remember that streams are agnostic regarding codepages. For an assembler and for the Windows functions ending with -A, "Ansi" and Utf8 are the same thing. It is when you reconvert it to Utf16 that a MessageBoxW looks arabic or chinese.
I was sending an UTF8 stream to a Wide control :eusa_snooty:
Quote from: HSE on December 14, 2022, 06:35:49 AMI was sending an UTF8 stream to a Wide control :eusa_snooty:
That won't work :biggrin:
Coding something similar to wRec$(someUtf8$) (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1419) shouldn't take you more than ten minutes, though: MultiByteToWideChar is your friend, of course.
Quote from: jj2007 on December 14, 2022, 09:19:26 AM
Coding something similar to wRec$(someUtf8$) (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1419) shouldn't take you more than ten minutes, though: MultiByteToWideChar is your friend, of course.
Yes, but could be more interesting to obtain the stream directly from JSON tree processing. Just in case a new UTF8 stream is unnecessary, and you want an specific text format.