News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Parsing Text file in Assembly Language

Started by NoCforMe, July 07, 2023, 12:27:01 PM

Previous topic - Next topic

HSE

Just in case, is mineiro challenge  :thumbsup:

I'm still in B to C trivial thing  :biggrin:
(solved now!!)
Equations in Assembly: SmplMath

jj2007

A few lines for testing your algos:

invoke MyAlgo, eax, "a string", FP4(456.789)
invoke MyAlgo, eax, 123, 456
InVoke someApi123, 111h, eax, ADDR mytext, 123,456,   addr MyVar,addr    MyVar2  ; comment
invoke CreateWindowEx, WS_EX_CLIENTEDGE, Chr$("RichEdit20A"), NULL, reStyle, 0, 0, 1, 1, hWnd, ID_EDIT, wcx.hInstance, NULL


Post yours, too, please :thup:

NoCforMe

This version

  • Allows the following for all 4 arguments:

    • Variable name
    • ADDR variable name
    • register
    • Decimal, hex or binary # (can be negative)
  • Numbers are preserved exactly as given by user
I guess that's all for this version. I'd like to point out that these changes actually reduced the size of the code overall. (No more numeric conversion required, for one thing: numbers are handled as text.)
Some of the recent changes didn't require any change to the code. Instead, all the changes were to data structures, which to me is the beauty of this technique: it's data driven. Yes, it's a bit more complicated than a nested mess of conditional code statements, but once you get it running, it's so easy to expand it or make changes to the "grammar" of what you're parsing.

I hope someone tests this out and reports back to us.

Next version: remove the fixed requirement for 4 arguments, let it handle n arguments (for some reasonable limit of n, say 8 or 10).
Assembly language programming should be fun. That's why I do it.

jj2007

Are there still special requirements for the invokes?

INVOKE--> code parser demo, version 2
Allows dec/hex/binary #, registers, var or ADDR var for all 4 args.

Enter statement to test: >invoke MyAlgo, eax, "a string", FP4(456.789)

Tokenization error.

Enter statement to test: >invoke MyAlgo, eax, 123, 456

Tokenization error.

Enter statement to test: >InVoke someApi123, 111h, eax, ADDR mytext, 123,456,   addr MyVar,addr    MyVar2  ; comment

Tokenization error.

Enter statement to test: >invoke CreateWindowEx, WS_EX_CLIENTEDGE, Chr$("RichEdit20A"), NULL, reStyle, 0, 0, 1, 1, hWnd, ID_EDIT, wcx.hInstance, NULL

Tokenization error.

NoCforMe

JJ: Aaaaargh. You're pushing the boundaries to the breaking point.

Here's the rulez:

Format:  INVOKE function, arg1, arg2, arg3, arg4
where all args can be any of the following:

  • The name of a variable
  • The address of a variable (ADDR varname)
  • A register name (only 64-bit names allowed)
  • A decimal, hex or binary number (incl. negatives)
All case-insensitive. 4 arguments, no more, no less. No floating-point stuff. NO STRINGS ALLOWED! This is just a demo (besides, how would a string make sense here?)

Curiously, it should accept Chr$("RichEdit20A") as a valid identifier.

Now try it again.
Assembly language programming should be fun. That's why I do it.

HSE

 :biggrin: a little obvious test.

Enter statement to test: >Enter statement to test: >INVOKE function, arg1, arg2, arg3, arg4

        MOV     R9, arg1
        MOV     R8, arg2
        MOV     RDX, arg3
        MOV     RCX, arg4
        CALL    function
Equations in Assembly: SmplMath

jj2007

That one worked, thanks Hector :thup:

Enter statement to test: >INVOKE function, arg1, arg2, arg3, arg4

        MOV     R9, arg1
        MOV     R8, arg2
        MOV     RDX, arg3
        MOV     RCX, arg4
        CALL    function

Enter statement to test: >INVOKE whatever, asdasd, 123, ecx

Tokenization error.


I just tested my version with invoke strings extracted from your source:
INVOKE  WinMain, EAX
INVOKE  ExitProcess, EAX
INVOKE  StdOut, OFFSET ProgramHeading
INVOKE  StdIn, OFFSET InputBuffer, SIZEOF InputBuffer
INVOKE  StdOut, OFFSET CRLFstr
INVOKE  wsprintf, ADDR buffer, OFFSET CALLfmt,
INVOKE  StdOut, ADDR buffer
INVOKE  strcmpi, OFFSET TextBuffer, [EBX].$T_entry.T_IDptr
INVOKE  strcpy, ECX, OFFSET FnameStorage
INVOKE  strcpy, ECX, OFFSET Var1Storage + 1
INVOKE  strcpy, ECX, OFFSET Var2Storage + 1
INVOKE  strcpy, ECX, OFFSET Var3Storage + 1
INVOKE  strcpy, ECX, OFFSET Var4Storage + 1


And I found a little bug :sad:

It's fixed, see attached version 3.

HSE

Enter statement to test: >INVOKE function, RDX, RCX, RAX, RBX

Tokenization error.


Also X64 ABI is inverted in result.
Equations in Assembly: SmplMath

NoCforMe

Quote from: jj2007 on July 11, 2023, 06:21:49 AM
Enter statement to test: >INVOKE whatever, asdasd, 123, ecx

Tokenization error.


Y'see, that one failed for a reason: it violated the grammar defined for the statement. (Which I gave above.) That's exactly what a parser is spozed to do.
Assembly language programming should be fun. That's why I do it.

NoCforMe

Quote from: HSE on July 11, 2023, 06:24:01 AM
Enter statement to test: >INVOKE function, RDX, RCX, RAX, RBX

Tokenization error.


Also X64 ABI is inverted in result.
Aaaaargh; that one shoulda worked. Back to the lab.
Fixed. Updated code attached to previous reply up there.

Regarding the correct order of args and registers: I'll fix that in the next, generalized version.
Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: NoCforMe on July 11, 2023, 06:25:49 AM
Quote from: jj2007 on July 11, 2023, 06:21:49 AM
Enter statement to test: >INVOKE whatever, asdasd, 123, ecx

Tokenization error.


Y'see, that one failed for a reason: it violated the grammar defined for the statement. (Which I gave above.) That's exactly what a parser is spozed to do.

Quote from: NoCforMe on July 11, 2023, 06:12:09 AM
Here's the rulez:

Format:  INVOKE function, arg1, arg2, arg3, arg4
where all args can be any of the following:

  • The name of a variable
  • The address of a variable (ADDR varname)
  • A register name (only 64-bit names allowed)
  • A decimal, hex or binary number (incl. negatives)
All case-insensitive. 4 arguments, no more, no less. No floating-point stuff. NO STRINGS ALLOWED!

INVOKE whatever, asdasd, 123, ecx failed, but
INVOKE whatever, asdasd, 123, rcx, 456 worked, congrats :thumbsup:

P.S.: Here is my output for the invoke strings extracted from your source, see version 3 above:

-------- Sample text: --------
INVOKE  WinMain, EAX

mov rcx, EAX
call WinMain

-------- Sample text: --------
INVOKE  ExitProcess, EAX

mov rcx, EAX
call ExitProcess

-------- Sample text: --------
INVOKE  StdOut, OFFSET ProgramHeading

mov rcx, OFFSET ProgramHeading
call StdOut

-------- Sample text: --------
INVOKE  StdIn, OFFSET InputBuffer, SIZEOF InputBuffer

mov rcx, OFFSET InputBuffer
mov rdx, SIZEOF InputBuffer
call StdIn

-------- Sample text: --------
INVOKE  StdOut, OFFSET CRLFstr

mov rcx, OFFSET CRLFstr
call StdOut

-------- Sample text: --------
INVOKE  wsprintf, ADDR buffer, OFFSET CALLfmt,

lea rcx, buffer
mov rdx, OFFSET CALLfmt
mov r8, ñm³v
call wsprintf

-------- Sample text: --------
INVOKE  StdOut, ADDR buffer

lea rcx, buffer
call StdOut

-------- Sample text: --------
INVOKE  strcmpi, OFFSET TextBuffer, [EBX].$T_entry.T_IDptr

mov rcx, OFFSET TextBuffer
mov rdx, [EBX].$T_entry.T_IDptr
call strcmpi

-------- Sample text: --------
INVOKE  strcpy, ECX, OFFSET FnameStorage

mov rcx, ECX
mov rdx, OFFSET FnameStorage
call strcpy

-------- Sample text: --------
INVOKE  strcpy, ECX, OFFSET Var1Storage + 1

mov rcx, ECX
mov rdx, OFFSET Var1Storage + 1
call strcpy

-------- Sample text: --------
INVOKE  strcpy, ECX, OFFSET Var2Storage + 1

mov rcx, ECX
mov rdx, OFFSET Var2Storage + 1
call strcpy

-------- Sample text: --------
INVOKE  strcpy, ECX, OFFSET Var3Storage + 1

mov rcx, ECX
mov rdx, OFFSET Var3Storage + 1
call strcpy

-------- Sample text: --------
INVOKE  strcpy, ECX, OFFSET Var4Storage + 1

mov rcx, ECX
mov rdx, OFFSET Var4Storage + 1
call strcpy


I should fix the mov rcx, EAX stuff :rolleyes:

NoCforMe

So, a couple general questions here:

1. What's the purpose of this conversion, anyhow? I mean apart from being the basis of a parsing demo, which I'm all in on.
What's wrong with using the INVOKE macro as-is? Why would someone want to unroll the code this way?

2. My next version of the demo will generalize it, so it won't be limited to strictly 4 arguments as it is now. But after looking at the x64 calling convention, I think I'll stop here:

  • Allow up to 8 arguments, the first 4 in registers, the last 4 pushed on the stack
  • No floating-point stuff (and therefore no SSE registers ( xmm0-xmm3). That's an exercise for the reader, as they say.
I think that's reasonable; after all, this is a parsing demo, not an exhaustive example of x64 usage. (Hell, I don't even use any 64-bit stuff myself!)

Anyhow, next (and probably last) version coming soon ...
Assembly language programming should be fun. That's why I do it.

HSE

Quote from: NoCforMe on July 11, 2023, 06:59:36 AM
basis of a parsing demo, which I'm all in on.

:thumbsup: Too much complexity is hard to follow. Perhaps demo have to reach difficulty just enough to be the third example in your essay, and no more.


Quote from: NoCforMe on July 11, 2023, 06:59:36 AMI don't even use any 64-bit stuff myself

Perhaps you haven't seen masm64 SDK invoke macro. That can be written a little more beauty:
    invoke MACRO fname:REQ,args:VARARG
      procedure_call fname,args
    ENDM

    procedure_call MACRO fname:REQ,a1:VARARG

      LOCAL lead,wrd2,ssize,sreg,svar
       
      arg1_n = 0 
      FOR arg2, <a1>

        ;; **************************
        ;; first 4 register arguments
        ;; **************************
          IF arg1_n eq 0
            REGISTER arg2,cl,cx,ecx,rcx,xmm0
          ENDIF       
          IF arg1_n eq 1
            REGISTER arg2,dl,dx,edx,rdx,xmm1
          ENDIF
          IF arg1_n eq 2
            REGISTER arg2,r8b,r8w,r8d,r8,xmm2
          ENDIF
          IF arg1_n eq 3
             REGISTER arg2,r9b,r9w,r9d,r9,xmm3
          ENDIF
        ;; **************************
        ;; following stack arguments
        ;; **************************
          IF arg1_n gt 3
            STACKARG arg2,arg1_n*8
          ENDIF

          arg1_n = arg1_n + 1
      ENDM

      call fname

    ENDM


Not so ugly spaghetti :biggrin:
Equations in Assembly: SmplMath

NoCforMe

Quote from: HSE on July 11, 2023, 07:24:01 AM
Quote from: NoCforMe on July 11, 2023, 06:59:36 AM
basis of a parsing demo, which I'm all in on.

:thumbsup: Too much complexity is hard to follow. Perhaps demo have to reach difficulty just enough to be the third example in your essay, and no more.
Well, I apologize for that. But the complexity here follows from the requirements of the demo, which are not trivial. My hope is that the underlying concepts--using a FSA for tokenization and a linked list for parsing--will somehow reveal themselves to the curious here despite the complexity. And hey, it's not that complicated!.
Quote
Not so ugly spaghetti :biggrin:

Yes, but that isn't a parser. Mine is.
Assembly language programming should be fun. That's why I do it.

HSE

Quote from: NoCforMe on July 11, 2023, 07:31:45 AM
Yes, but that isn't a parser. Mine is.

  :thumbsup: Just kind of "lexical analysis" (Masm have the macro tokenizer) 
Equations in Assembly: SmplMath