News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Parsing Text file in Assembly Language

Started by NoCforMe, July 07, 2023, 12:27:01 PM

Previous topic - Next topic

NoCforMe

I've written a paper that covers this topic, attached below.

This may or may not be what you're looking for. This is what I consider an excellent technique for parsing just about any kind of text. I've used it many times. It's based on assembly language but could be adapted for any other language. While it may look a bit complex, it's actually pretty simple and straightforward, once you get the basic concept.

There are actually two distinct phases here, the first being what's called "tokenization"--analyzing the text stream and separating elements into "tokens"--and the second is the actual parsing process (sometimes called "lexical analysis") where the stream of "tokens" is analyzed and actions taken by the parser depending on which tokens are seen.

Anyhow, you might give it a look-see, find out if this might work for you. Like I say, it's a very flexible tool. The really nice thing about it is that it isn't a mess of conditional statements, all nested and snarled up like a box of snakes: it's a table-driven process. Once you've diagrammed your parsing task and created the tokenization table, the code practically writes itself.

Note: Since the attachment (PDF) was larger than 512 KB, I had to break it into 2 pieces. The 2nd part is attached to the reply to this post.
Assembly language programming should be fun. That's why I do it.

NoCforMe

Here's the 2nd part of the PDF attachment. If this looks like something you'd want to use, or if you have questions, LMK.
Assembly language programming should be fun. That's why I do it.

HSE

Quote from: NoCforMe on July 07, 2023, 12:27:01 PM
I've written a paper that covers this topic, attached below.

Look an impressive work  :thumbsup:
Equations in Assembly: SmplMath

jj2007

Quote from: NoCforMe on July 07, 2023, 12:27:01 PM
I've written a paper that covers this topic, attached below.

You put a lot of work into that :thumbsup:

However, it seems that sepult has lost interest :rolleyes:

NoCforMe

Quote from: jj2007 on July 08, 2023, 01:06:29 AM
Quote from: NoCforMe on July 07, 2023, 12:27:01 PM
I've written a paper that covers this topic, attached below.

You put a lot of work into that :thumbsup:

However, it seems that sepult has lost interest :rolleyes:

That's OK; I realize this isn't for everybody. Eventually someone will come along here who finds this at least somewhat intriguing. We'll only have to wait, say, a couple years ...

Seriously, I'd be really stoked to see someone use this, since it has worked so well for me over the decades.
Assembly language programming should be fun. That's why I do it.

mineiro

Interesting, it's the first step to write translators, preprocessors, converters, scripts, ... .

Basically you can provide characters allowed in the text, not allowed in the text, character that marks the end of line, character(s) used by the tokenizer. From there, words are registered that will be returned as identifiers.
Next comes logical precedence, priority of symbols over symbols. Extra functions can convert a hexadecimal string to hexadecimal number, in short, conversions.
In the glib library there is a lexical scanner, I used it a lot when I migrated to another OS.

Good job, reminded me of the red dragon book.

I'd rather be this ambulant metamorphosis than to have that old opinion about everything

NoCforMe

I got the core idea for my parsing scheme from a computer science book I read back in the 1980s, forget exactly which one (it wasn't Knuth, which I also took a look at), which described the workings of a finite-state automaton (FSA). It was one of the few things in the book that wasn't completely over my head at the time.
Assembly language programming should be fun. That's why I do it.

HSE

Art Of Assembly? Have a nice chapter about finite state machines.
Equations in Assembly: SmplMath

NoCforMe

Quote from: HSE on July 08, 2023, 06:15:12 AM
Art Of Assembly? Have a nice chapter about finite state machines.

No, the book had nothing to do with any language; it was a general computer science text.

Dang, wish I had it now; I might be able to understand more of it. It covered stuff like hashing, sparse-text tables, compiler construction, etc. Plus the usual sort algorithms, etc.
Assembly language programming should be fun. That's why I do it.

mineiro

Quote from: NoCforMe on July 08, 2023, 06:20:09 AM
Dang, wish I had it now; I might be able to understand more of it. It covered stuff like hashing, sparse-text tables, compiler construction, etc. Plus the usual sort algorithms, etc.
I think can be Algorithms by Robert Sedgewick, Brown University, 1983-1984.
Great book.

Quote from: HSE on July 08, 2023, 06:15:12 AM
Art Of Assembly? Have a nice chapter about finite state machines.
AOA have a nice chapter dealing with boolean operators.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

NoCforMe

Quote from: mineiro on July 08, 2023, 07:08:55 AM
Quote from: NoCforMe on July 08, 2023, 06:20:09 AM
Dang, wish I had it now; I might be able to understand more of it. It covered stuff like hashing, sparse-text tables, compiler construction, etc. Plus the usual sort algorithms, etc.
I think can be Algorithms by Robert Sedgewick, Brown University, 1983-1984.
Great book.

Could be.

If you really want the book on computer programming, that would be Donald Knuth's multivolume The Art of Computer Programming. I only wish I could understand more than about 10% of it.

Quote from: HSE on July 08, 2023, 06:15:12 AM
Art Of Assembly? Have a nice chapter about finite state machines.

I wonder how close their technique for using them is to mine. Wouldn't be surprised if it was similar; only so many ways to skin a cat. (Sorry, kitty!)
Assembly language programming should be fun. That's why I do it.

NoCforMe

Hey, I'll make this offer: if anyone (including the OP) posts the spec for something they want parsed, I'll create a parser for it using my scheme and post the code here. (Assuming it's not too complicated!) Be sure to specify exactly the text you need to be interpreted.
Assembly language programming should be fun. That's why I do it.

HSE

I don't know what is OP. Here "opa" is a big person with some kind of mental deficiency, but good heart (not very used this days).

First we have to test your examples  :thumbsup:
Equations in Assembly: SmplMath

NoCforMe

OP = original poster (they who started the thread)
Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: NoCforMe on July 08, 2023, 02:02:57 PM
Hey, I'll make this offer: if anyone (including the OP) posts the spec for something they want parsed, I'll create a parser for it using my scheme and post the code here. (Assuming it's not too complicated!) Be sure to specify exactly the text you need to be interpreted.

I can offer a haystack, i.e. a fat text to be parsed: http://www.jj2007.eu/Bible.zip

Quote from: jj2007 on April 28, 2023, 04:35:21 PM
UnzipFile:

include \masm32\MasmBasic\MasmBasic.inc
  Init
  UnzipInit "http://www.jj2007.eu/Bible.zip" ; file or URL
  UnzipFile(0, "C:\Masm32") ; extract C:\Masm32\Bible.txt
EndOfCode