I've written a paper that covers this topic, attached below.
This may or may not be what you're looking for. This is what I consider an excellent technique for parsing just about any kind of text. I've used it many times. It's based on assembly language but could be adapted for any other language. While it may look a bit complex, it's actually pretty simple and straightforward, once you get the basic concept.
There are actually two distinct phases here, the first being what's called "tokenization"--analyzing the text stream and separating elements into "tokens"--and the second is the actual parsing process (sometimes called "lexical analysis") where the stream of "tokens" is analyzed and actions taken by the parser depending on which tokens are seen.
Anyhow, you might give it a look-see, find out if this might work for you. Like I say, it's a very flexible tool. The really nice thing about it is that it isn't a mess of conditional statements, all nested and snarled up like a box of snakes: it's a table-driven process. Once you've diagrammed your parsing task and created the tokenization table, the code practically writes itself.
Note: Since the attachment (PDF) was larger than 512 KB, I had to break it into 2 pieces. The 2nd part is attached to the reply to this post.
Here's the 2nd part of the PDF attachment. If this looks like something you'd want to use, or if you have questions, LMK.
Quote from: NoCforMe on July 07, 2023, 12:27:01 PM
I've written a paper that covers this topic, attached below.
Look an impressive work :thumbsup:
Quote from: NoCforMe on July 07, 2023, 12:27:01 PM
I've written a paper that covers this topic, attached below.
You put a lot of work into that :thumbsup:
However, it seems that sepult has lost interest :rolleyes:
Quote from: jj2007 on July 08, 2023, 01:06:29 AM
Quote from: NoCforMe on July 07, 2023, 12:27:01 PM
I've written a paper that covers this topic, attached below.
You put a lot of work into that :thumbsup:
However, it seems that sepult has lost interest :rolleyes:
That's OK; I realize this isn't for everybody. Eventually someone will come along here who finds this at least somewhat intriguing. We'll only have to wait, say, a couple years ...
Seriously, I'd be really stoked to see someone use this, since it has worked so well for me over the decades.
Interesting, it's the first step to write translators, preprocessors, converters, scripts, ... .
Basically you can provide characters allowed in the text, not allowed in the text, character that marks the end of line, character(s) used by the tokenizer. From there, words are registered that will be returned as identifiers.
Next comes logical precedence, priority of symbols over symbols. Extra functions can convert a hexadecimal string to hexadecimal number, in short, conversions.
In the glib library there is a lexical scanner, I used it a lot when I migrated to another OS.
Good job, reminded me of the red dragon book.
(https://m.media-amazon.com/images/I/51FWXX9KWVL._SX384_BO1,204,203,200_.jpg)
I got the core idea for my parsing scheme from a computer science book I read back in the 1980s, forget exactly which one (it wasn't Knuth, which I also took a look at), which described the workings of a finite-state automaton (FSA). It was one of the few things in the book that wasn't completely over my head at the time.
Art Of Assembly? Have a nice chapter about finite state machines.
Quote from: HSE on July 08, 2023, 06:15:12 AM
Art Of Assembly? Have a nice chapter about finite state machines.
No, the book had nothing to do with any language; it was a general computer science text.
Dang, wish I had it now; I might be able to understand more of it. It covered stuff like hashing, sparse-text tables, compiler construction, etc. Plus the usual sort algorithms, etc.
Quote from: NoCforMe on July 08, 2023, 06:20:09 AM
Dang, wish I had it now; I might be able to understand more of it. It covered stuff like hashing, sparse-text tables, compiler construction, etc. Plus the usual sort algorithms, etc.
I think can be Algorithms by Robert Sedgewick, Brown University, 1983-1984.
Great book.
Quote from: HSE on July 08, 2023, 06:15:12 AM
Art Of Assembly? Have a nice chapter about finite state machines.
AOA have a nice chapter dealing with boolean operators.
Quote from: mineiro on July 08, 2023, 07:08:55 AM
Quote from: NoCforMe on July 08, 2023, 06:20:09 AM
Dang, wish I had it now; I might be able to understand more of it. It covered stuff like hashing, sparse-text tables, compiler construction, etc. Plus the usual sort algorithms, etc.
I think can be Algorithms by Robert Sedgewick, Brown University, 1983-1984.
Great book.
Could be.
If you really want
the book on computer programming, that would be Donald Knuth's multivolume
The Art of Computer Programming. I only wish I could understand more than about 10% of it.
Quote from: HSE on July 08, 2023, 06:15:12 AM
Art Of Assembly? Have a nice chapter about finite state machines.
I wonder how close their technique for using them is to mine. Wouldn't be surprised if it was similar; only so many ways to skin a cat. (Sorry, kitty!)
Hey, I'll make this offer: if anyone (including the OP) posts the spec for something they want parsed, I'll create a parser for it using my scheme and post the code here. (Assuming it's not too complicated!) Be sure to specify exactly the text you need to be interpreted.
I don't know what is OP. Here "opa" is a big person with some kind of mental deficiency, but good heart (not very used this days).
First we have to test your examples :thumbsup:
OP = original poster (they who started the thread)
Quote from: NoCforMe on July 08, 2023, 02:02:57 PM
Hey, I'll make this offer: if anyone (including the OP) posts the spec for something they want parsed, I'll create a parser for it using my scheme and post the code here. (Assuming it's not too complicated!) Be sure to specify exactly the text you need to be interpreted.
I can offer a haystack, i.e. a fat text to be parsed: http://www.jj2007.eu/Bible.zip
Quote from: jj2007 on April 28, 2023, 04:35:21 PM
UnzipFile (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1234):
include \masm32\MasmBasic\MasmBasic.inc
Init
UnzipInit "http://www.jj2007.eu/Bible.zip" ; file or URL
UnzipFile(0, "C:\Masm32") ; extract C:\Masm32\Bible.txt
EndOfCode
Hi David,
Quote from: NoCforMe on July 08, 2023, 02:02:57 PM
Hey, I'll make this offer: if anyone (including the OP) posts the spec for something they want parsed, I'll create a parser for it using my scheme and post the code here.
I have request :biggrin:
Can you post complete first example? For what I'm reading, flow get stuck forever in last column:
gt20:
jmp qword ptr [rbx + TokenParseTbl + $tokenAnyOffset]
Quote from: jj2007 on July 08, 2023, 07:22:12 PM
Quote from: NoCforMe on July 08, 2023, 02:02:57 PM
Hey, I'll make this offer: if anyone (including the OP) posts the spec for something they want parsed, I'll create a parser for it using my scheme and post the code here. (Assuming it's not too complicated!) Be sure to specify exactly the text you need to be interpreted.
I can offer a haystack, i.e. a fat text to be parsed: http://www.jj2007.eu/Bible.zip
Quote from: jj2007 on April 28, 2023, 04:35:21 PM
UnzipFile (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1234):
include \masm32\MasmBasic\MasmBasic.inc
Init
UnzipInit "http://www.jj2007.eu/Bible.zip" ; file or URL
UnzipFile(0, "C:\Masm32") ; extract C:\Masm32\Bible.txt
EndOfCode
OK, that's fine, but what's the particular needle you're searching for in that haystack?
Quote from: NoCforMe on July 09, 2023, 05:22:29 AMOK, that's fine, but what's the particular needle you're searching for in that haystack?
For example, all phrases that start with "Satan"?
The problem is here that the OP has lost interest, and we are just playing around here.
Quote from: jj2007 on July 09, 2023, 05:33:40 AM
Quote from: NoCforMe on July 09, 2023, 05:22:29 AMOK, that's fine, but what's the particular needle you're searching for in that haystack?
For example, all phrases that start with "Satan"?
OK, I may tackle that after I post the example that Hector requested.
QuoteThe problem is here that the OP has lost interest, and we are just playing around here.
Hmm, not really a problem. We're allowed to play around if we like.
After all, didn't you say that assembly should be fun?
Quote from: HSE on July 09, 2023, 03:24:52 AM
Hi David,
Quote from: NoCforMe on July 08, 2023, 02:02:57 PM
Hey, I'll make this offer: if anyone (including the OP) posts the spec for something they want parsed, I'll create a parser for it using my scheme and post the code here.
I have request :biggrin:
Can you post complete first example? For what I'm reading, flow get stuck forever in last column:gt20:
jmp qword ptr [rbx + TokenParseTbl + $tokenAnyOffset]
Here's the first example fleshed out. I left out some code that's needed (string matching and numeric conversion), but this should give the general idea. (You also need to provide a GNC(), but this is trivial, just get the next byte from the text buffer.)
$exitCodeSuccess EQU 0
$exitCodeError EQU -1
.data
TextBufferPtr DD ?
;====================================
; The Tokenization Table
;
; This is what drives the whole process
;====================================
TokenParseTbl LABEL DWORD
; [#] = , EOL [any]
; ------------------------------------------
DD _pnX, _pnX, _pnX, _pnX, _pnA ;0
DD _pnX, _pnB, _pnX, _pnX, _pn1 ;1
DD _pnC, _pnX, _pnX, _pnX, _pnX ;2
DD _pn3, _pnX, _pnD, _pnX, _pnX ;3
DD _pnE, _pnX, _pnX, _pnX, _pnX ;4
DD _pn5, _pnX, _pnX, _pnF, _pnX ;5
TokenParseChars DB '=', ',', $EOL
$numParseChars EQU $ - TokenParseChars
$tokenRow1 EQU ($numParseChars + 2) * 4 ;DWORD offset
$tokenRow2 EQU $tokenRow1 * 2
$tokenRow3 EQU $tokenRow1 * 3
$tokenRow4 EQU $tokenRow1 * 4
$tokenRow5 EQU $tokenRow1 * 5
$tokenAnyOffset EQU ($numParseChars + 1) * 4
TextBuffer DB 256 DUP(?)
.code
;====================================
; Tokenizer()
;
; Reads the text stream, breaks it into tokens.
; On return from a successful tokenization,
; the values "x" and "y" are stored.
;
; Returns:
; $exitCodeSuccess or
; $exitCodeError
;====================================
Tokenizer PROC
XOR EBX, EBX ;Start @ row 0.
parse0: CALL GNC ;Get next char. from buffer.
CMP AL, $CR ;Throw away carriage returns.
JE parse0
CMP AL, $tab ;Magically turn tabs into spaces.
JNE gt3
MOV AL, ' '
; 1. Weed out numeric digits:
gt3: CMP AL, '0'
JB gt10
CMP AL, '9'
JA gt10
JMP DWORD PTR [EBX + TokenParseTbl] ;[#] is 1st col. in table row.
; 2. Try to match any parsing characters:
gt10: LEA EDI, TokenParseChars
MOV EDX, EDI ;Save pointer to list of chars.
MOV ECX, $numParseChars
REPNE SCASB
JNE gt20 ;No match, go to "any" col.
SUB EDI, EDX ;Get offset from start of list.
SHL EDI, 2 ;Convert to DWORD offset.
JMP DWORD PTR [EBX + EDI + TokenParseTbl]
; No match, so go to "any char." column:
gt20: JMP DWORD PTR [EBX + TokenParseTbl + $tokenAnyOffset]
;====================================
; Tokenizing Nodes
;====================================
;***** Set up for ID storage: *****
_pnA LABEL NEAR
MOV TextBufferPtr, OFFSET TextBuffer
MOV EBX, $tokenRow1
; ... fall through to ...
;***** Store ID char.: *****
_pn1 LABEL NEAR
MOV EDX, TextBufferPtr
MOV [EDX], AL
INC TextBufferPtr
JMP parse0
_pn2 LABEL NEAR ;"Do-nothing" node.
JMP parse0
;***** Match ID against string ("hotspot"): *****
_pnB LABEL NEAR
; (insert code to do string matching here)
; Will also have to return an error if the string doesn't match "hotspot".
MOV EBX, $tokenRow2
JMP parse0
;***** Set up for # storage: *****
; (we use the same buffer for text and # storage)
_pnC LABEL NEAR
MOV TextBufferPtr, OFFSET TextBuffer
MOV EBX, $tokenRow3
; ... fall through to ...
;***** Store # char.: *****
; (notice this is the same as _pn1; we could re-use that node
; and save some space, but let's keep this for clarity)
_pn3 LABEL NEAR
MOV EDX, TextBufferPtr
MOV [EDX], AL
INC TextBufferPtr
JMP parse0
;***** Save "x" value: *****
_pnD LABEL NEAR
; (insert code to convert ASCII digits to binary and store it
; as the value for "x")
MOV EBX, $tokenRow4
JMP parse0
;***** Set up for # storage: *****
_pnE LABEL NEAR
MOV TextBufferPtr, OFFSET TextBuffer
MOV EBX, $tokenRow5
. ... fall through to ...
;***** Store # char.: *****
_pn5 LABEL NEAR
MOV EDX, TextBufferPtr
MOV [EDX], AL
INC TextBufferPtr
JMP parse0
;***** Save "y" value, exit w/success: *****
_pnF LABEL NEAR
; (insert code to convert ASCII digits to binary and store it
; as the value for "y")
MOV EAX, $exitCodeSuccess
RET
;***** Return error: *****
_pnX LABEL NEAR
MOV EAX, $exitCodeError
RET
Tokenizer ENDP
One thing to realize is how easily this code flows from the diagram. Look at the diagram for that 1st example, and you should be able to see how the code for the tokenization "nodes" follows it. That's what I really like about this method, it kind of writes itself.
Ok, there are no phrases that start with Satan in Bible.txt (http://www.jj2007.eu/Bible.zip), so I used The Lord instead :biggrin:
Source:
include \masm32\MasmBasic\MasmBasic.inc
Init
Let esi=FileRead$("bible.txt")
xor ecx, ecx
.While 1
inc ecx ; increase start position
.Break .if !Instr_(ecx, esi, "The Lord", 4) ; full word, case-sensitive
mov edi, eax ; save start
lea ecx, [edx+8] ; advance index
.Repeat
dec eax
mov dl, [eax]
.Until dl==33 || dl=="?" || dl=="." || dl==10 || dl>"@" ; end.!? Satan
.if Zero?
mov eax, edi ; start
.Repeat
inc eax
mov dl, [eax]
.Until dl==33 || dl=="?" || dl=="." || dl==13 || !dl ; end of phrase
sub eax, edi
inc eax ; get the dot, too
.if eax>80
m2m eax, 80 ; too long for display
.endif
PrintLine Left$(edi, eax)
.endif
.Endw
Inkey "done"
EndOfCode
Output:
The Lord work a care and conscience in us to know Him and serve Him, that we may
The Lord of heaven and earth bless Your Majesty with many and happy days, that,
The Lord shall laugh at him: for he seeth that his day is coming.
The Lord gave the word: great [was] the company of those that published [it].
The Lord said, I will bring again from Bashan, I will bring [my people] again fr
The Lord at thy right hand shall strike through kings in the day of his wrath.
The Lord sent a word into Jacob, and it hath lighted upon Israel.
The Lord GOD hath given me the tongue of the learned, that I should know how to
The Lord GOD hath opened mine ear, and I was not rebellious, neither turned away
The Lord GOD which gathereth the outcasts of Israel saith, Yet will I gather [ot
The Lord hath trodden under foot all my mighty [men] in the midst of me: he hath
The Lord hath swallowed up all the habitations of Jacob, and hath not pitied: he
The Lord was as an enemy: he hath swallowed up Israel, he hath swallowed up all
The Lord hath cast off his altar, he hath abhorred his sanctuary, he hath given
The Lord GOD hath sworn by his holiness, that, lo, the days shall come upon you,
The Lord GOD hath sworn by himself, saith the LORD the God of hosts, I abhor the
The Lord then answered him, and said, [Thou] hypocrite, doth not each one of you
The Lord [is] at hand.
The Lord [be] with you all.
The Lord give mercy unto the house of Onesiphorus; for he oft refreshed me, and
The Lord grant unto him that he may find mercy of the Lord in that day: and in h
The Lord Jesus Christ [be] with thy spirit.
The Lord knoweth how to deliver the godly out of temptations, and to reserve the
The Lord is not slack concerning his promise, as some men count slackness; but i
done
Quote from: NoCforMe on July 09, 2023, 06:27:30 AMit kind of writes itself.
:biggrin: I'm very used to "a mess of conditional statements"
Thanks, that will help :thumbsup:
HSE
JJ, try to think of a better example. What you showed is basically a text search; how about something where you're extracting values from a text construct, along the lines of my "hotspot=<x>,<y>", but a little less trivial. Something I can get my parsing teeth into.
Sorry, I don't understand what you mean - maybe you should give it a try?
Honestly, I am curious to see a concrete example of your approach.
:biggrin: that look better, in code:;***** Set up for ID storage: *****
_pnA LABEL NEAR
lea rcx, TextBuffer
mov TextBufferPtr, rcx
mov rbx, $tokenRow1
; ... fall through to ...
because in article say:***** Set up for ID storage: *****
_pnA LABEL NEAR
MOV TextBufferPtr, OFFSET TextBuffer
MOV EBX, $tokenRow1
JMP parse0
wich make me lost the path :rolleyes:
Yes, that is a little misleading. I should probably provide the entire example in the article to make that more clear.
I give to you a challenge sir NoCforMe. I understood how much powerfull it's what you posted. This can be a great project if persons want to create plugins to some text file editors.
Translate line bellow:
invoke function, addr something, 1, 2, 3
to strings:
mov r9,3
mov r8,2
mov rdx,1
lea rcx, addr something
call function
Rules:
addr inside invoke ID should be translated as "lea" instead of "mov".
really sorry, I can't modify my error:
So,
lea rcx, addr something
to:
lea rcx, something
Well, 7 beer effect.
Excellent challenge! Just what the doctor ordered.
I'm on it.
First task: clearly identify the problem.
To generalize, we have as input:
INVOKE f, s, a1, a2, a3where
- f can be:
- the name of a function
- a register holding the function's address
- a variable holding the function's address
- s can be:
- the address of a variable (ADDR var)
- the value of a variable (var)
- a register holding a value
- a1, a2 and a3 can be:
- the address of a variable (ADDR var)
- the value of a variable (var)
- a register holding a value[\li]
OK? So far so good. We want to transform this to the form you gave in your challenge.
Since the first task of parsing is tokenization, we need to define the universe of tokens here:
- unknown IDs
- numbers (decimal or hex)
- known IDs:
- comma
- space (space or tab)
- EOL (end-of-line marker)
That's it. We don't need to include all the possible register names because those will simply be treated as "unknown IDs", just like any other variable.
I'll work on the tokenizer and get back when it's done.
Sir, you went further than I asked, make things simple so people can understand how powerful a lexical scanner is.
If you want to create what you proposed, then release two versions.
Leave the rest to modular programming, people's cognitive parallax.
Actually it's no more complex than your original spec, except for handling hex #s, which is only slightly more complicated.
Here's my FSA diagram for the tokenizer. This is probably the most important part of the process--drawing the tokenizer. It needs to be done on paper. All the code you'll write will come from this picture.
Should be self-explanatory. This will become a subroutine called GetNextToken(). The top part (nodes A, 1 and B) tokenizes identifiers. Nodes G and H return EOL and comma respectively. The rest of it is for numbers (decimal or hex).
yes, like that.
Tabulations (09h) and spaces(20h) inside this invoke logic (parser) should be look as ignore char (increase pointer). The symbol "," will be the tokenizer.
If inside invoke context, previoussss valid char before the eol was "\", so continue to next line (ignore eol) until not found "\" (real eol).
Some persons write:
invoke function,\ ;some comments, there's tabs and spaces separating this
addr something,\ ;another
1,\
2,\
3
That's all. If an user hits:
db "invoke something"
So, that will be outside this "scope" and that scope will treat invoke as literal because quotes, symbols precedence, or, other scope, nothing to do here.
Here's a demo tokenizer, attached below. It's a console program that prompts you for a statement, then tokenizes it (or at least attempts to), and displays the results. Was kinda fun to make and to use. Try it out.
You'll notice that it does a great job of recognizing tokens, but it's dumb as a stump when it comes to making any kind of sense out of what you type. All of the following will successfully pass through it:
- invoke function, addr sam, edx, 0ffffh, addr var
- sam invoke eax, , a box of pills
- eax 2 3 4 invoke,
That's because it's just a dumb tokenizer that doesn't know anything about the "grammar" of the statement it's working on, meaning what needs to go where for the statement to make sense. That'll happen during the next task. All it knows is how to break apart the statement into its "atomic" parts. But we've accomplished a lot so far. Stay tuned.
The two important things to look at here in the source are the tokenization table (
TokenParseTbl) and associated data, and the
GetNextToken() subroutine which does the tokenization. The rest of the stuff is support functions: getting characters out of the buffer, matching IDs, converting #s, uppercasing characters, etc.
OK, last thing for tonight: here's the parsing sequence for all those tokens we can now extract. This defines the "grammar" of the statement. I've followed Hector's "challenge" here, but expanded a bit on the last 3 arguments, which can be any of a variable name, the ADDR of a variable, a number (decimal or hex), or the name of a register. It's very easy to accommodate these without a ton of spaghetti code, since this thing is data-driven as you'll see. But first I have to get some sleep ...
Again, it's very important that you put this down on paper first before typing one single character into your code.
My take on the tokeniser (full source of 76 lines attached). There is a text file with samples attached, edit at your convenience; it contains lines like this:
InVoke someApi123, 111h, eax, ADDR mytext, 123,456, addr MyVar,addr MyVar2 ; comment
Here is the tokeniser:
GetToken proc ; expects string in esi
.Repeat
lodsb
.Until al>="@" || (al>="0" && al<="9") || al==59
dec esi ; esi on someApi
mov eax, esi
.Repeat
inc eax
mov dl, [eax]
.Until dl=="," || !dl || dl==59 || dl==13 ; 59=comment
push edx ; last byte (could be zero or carriage return)
.Repeat
dec eax
mov dl, [eax]
.Until dl>="@" || (dl>="0" && dl<="9") ; eax points to trailing bytes
inc eax
push eax ; addr last valid byte
sub eax, esi
xchg eax, ecx
Let t$(tCt)=Left$(esi, ecx)
inc tCt
pop esi
pop eax
and eax, 11110010b ; stop for Cr=1101b and nullbyte but not for comma or comment
ret
GetToken endp
Sample output (yes, the assembler would throw two errors):
-------- Sample text: --------
invoke MyAlgo, eax, 123, 456
mov rcx, eax
mov rdx, 123
mov r8, 456
call MyAlgo
-------- Sample text: --------
InVoke someApi123, 111h, eax, ADDR mytext, 123,456, addr MyVar,addr MyVar2 ; comment
line two
mov rcx, 111h
mov rdx, eax
lea r8, mytext
mov r9, 123
push 456
lea rax, MyVar
push rax
lea rax, MyVar2
push rax
call someApi123
Quote from: NoCforMe on July 09, 2023, 06:27:30 AM
One thing to realize is how easily this code flows from the diagram.
Mmm :biggrin:
How it move from B to C? (first example)
Quote from: jj2007 on July 09, 2023, 08:22:26 PM
Sample output (yes, the assembler would throw two errors):
Or maybe three:
mov rcx, eax
line two
push 456
push rax
push rax
3 pushed itens in stack (dword,qword,qword), rsp register will be unaligned before instruction call supposing expansion to 3 qwords..
Quote from: mineiro on July 10, 2023, 02:31:24 AM
Quote from: jj2007 on July 09, 2023, 08:22:26 PM
Sample output (yes, the assembler would throw two errors):
Or maybe three:
mov rcx, eax
line two
push 456
push rax
push rax
3 pushed itens in stack (dword,qword,qword), rsp register will be unaligned before instruction call supposing expansion to 3 qwords..
- The "line two" was just a test if the tokeniser recognises a carriage return correctly.
- A push 123 is not DWORD, it's qword.
- Re stack alignment, in real code, a CreateWindowEx looks like this:
invoke CreateWindowEx, WS_EX_CLIENTEDGE, Chr$("RichEdit20A"), NULL, reStyle, 0, 0, 1, 1, hWnd, ID_EDIT, wcx.hInstance, NULL
000000014000117C | 48:836424 58 00 | and [rsp+58],0 |
0000000140001182 | 4C:8B15 C31E0000 | mov r10,[14000304C] |
0000000140001189 | 4C:895424 50 | mov [rsp+50],r10 |
000000014000118E | 48:C74424 48 6F000000 | mov [rsp+48],6F | 6F:'o'
0000000140001197 | 44:8B55 10 | mov r10d,[rbp+10] |
000000014000119B | 4C:895424 40 | mov [rsp+40],r10 |
00000001400011A0 | 48:C74424 38 01000000 | mov [rsp+38],1 |
00000001400011A9 | 48:C74424 30 01000000 | mov [rsp+30],1 |
00000001400011B2 | 48:836424 28 00 | and [rsp+28],0 |
00000001400011B8 | 48:836424 20 00 | and [rsp+20],0 |
00000001400011BE | 41:B9 C4013050 | mov r9d,503001C4 |
00000001400011C4 | 45:33C0 | xor r8d,r8d | r8d:"flags=3"
00000001400011C7 | 48:8D15 E11E0000 | lea rdx,[1400030AF] | 00000001400030AF:"RichEdit20A"
00000001400011CE | B9 00020000 | mov ecx,200 |
00000001400011D3 | FF15 A3330000 | call [<&CreateWindowExA>] |
OK, I see I have some competition here. I'll persist in my project nonetheless
This is a learning experience for me as well; going over my design, I can see some flaws that should be corrected (in the next version):
- Number handling: since I'm converting all #s (decimal or hex) to binary, they're going to look different from what the original coder wrote (plus I can't handle negative #s). In the next version, numbers will simply be treated as text (e.g., -1 or 0FFFFh or whatever), after validation.
- Since registers are treated as "unknown identifiers", it's possible to have nonsensical constructs like ADDR rsi. Easily fixed by adding all known registers to the ID match list and creating a new token type, say $T_register.
Probably there are other issues I haven't discovered yet.
Hopefully later today I'll have a full parser coded and posted here. We'll see.
Quote from: NoCforMe on July 10, 2023, 05:38:38 AMit's possible to have nonsensical constructs like ADDR rsi.
Right, but that's going one step further: a syntax check, plus error messages...
Here's the parsing demo, attached below.
It works, and I believe it meets Hector's challenge (he'll have to be the judge of that). As I said, there are some problems with it, and certainly room for refinement. I'll put out another version soon that addresses at least some of these issues.
Notice that when entering any of the last 3 variables, they can be:
- the name of a variable
- the address of a variable (ADDR varname)
- a decimal or hexadecimal number (hex numbers have "h" appended, of course)
Notice that it correctly handles the difference between a variable value and its address:
varname--> MOV [register], varname
ADDR var--> LEA [register], varnameAlso, with Hector's permission, I would allow the first parameter to be any of these things. I don't see any reason why it should be limited to
ADDR varname. (But then I don't do 64-bit programming.)
I don't like the way it handles numbers. if you enter a # in hex it spits it out in decimal. The correct way to handle this will be to preserve the original text the user entered, including negative numbers, which will require changes to the tokenizer. That will also simplify the code by eliminating the numeric conversion routines.
* I learned something from this, which is that MASM accepts negative hex numbers. I didn't know this, as I've never used them.
One problem is that you can still enter nonsensical constructs, like
ADDR RSI. because it doesn't know a register name from a hole in the ground. So I'll add all the register names to the list of known IDs and tag them as a register type to avoid this problem.
Let me know whatcha think.
Enter statement to test: >InVoke someApi123, 111h, eax, ADDR mytext, 123,456, addr MyVar
Tokenization error.
Enter statement to test: >invoke someApi, 111h, eax
Tokenization error.
What's wrong?
The previous version (INVOKEtokenizer) works fine:
Enter statement to test: >invoke MyAlgo, eax, 123, 456
TOKEN: INVOKE
TOKEN: Unknown ID: "MyAlgo"
TOKEN: comma
TOKEN: Unknown ID: "eax"
TOKEN: comma
TOKEN: number (123)
TOKEN: comma
TOKEN: number (456)
TOKEN: EOL. Tokenization completed successfully!
Quote from: HSE on July 10, 2023, 01:40:36 AM
Quote from: NoCforMe on July 09, 2023, 06:27:30 AM
One thing to realize is how easily this code flows from the diagram.
Mmm :biggrin:
How it move from B to C? (first example)
Sorry I missed your question before. You're talking about that trivial 1st example, right? if you're on node B, seeing a numeric character (#) will move you to node C.
Be sure to check out the parsing demo I just posted.
Quote from: jj2007 on July 10, 2023, 09:52:48 AM
Enter statement to test: >InVoke someApi123, 111h, eax, ADDR mytext, 123,456, addr MyVar
Tokenization error.
Enter statement to test: >invoke someApi, 111h, eax
Tokenization error.
What's wrong?
Sorry, should have been more explicit.
I'm following Hector's challenge pretty closely, where the first argument
must be ADDR var. Try that. You also have to give 4 arguments, the last 3 of which can be any of what I described above.
Here's the template:
INVOKE function, ADDR var1, var2, var3, var4There's no reason it has to work that way; I'm just following the format of the challenge. I'll modify it to accept any number of arguments.
Additional enhancement: For the sake of completeness, it should handle binary #s too (
10100011B). Version after the next, maybe.
Never mind, I also just discovered a little glitch in my version :tongue:
invoke CreateWindowEx, WS_EX_CLIENTEDGE, Chr$("RichEdit20A"), NULL, reStyle, 0, 0, 1, 1, hWnd, ID_EDIT, wcx.hInstance, NULL
mov rcx, WS_EX_CLIENTEDGE
mov rdx, Chr$("RichEdit20A
mov r8, NULL
mov r9, reStyle
push 0
push 0
push 1
push 1
push hWnd
push ID_EDIT
push wcx.hInstance
push NULL
call CreateWindowEx
Correct version attached.
Just in case, is mineiro challenge :thumbsup:
I'm still in B to C trivial thing :biggrin:
(solved now!!)
A few lines for testing your algos:
invoke MyAlgo, eax, "a string", FP4(456.789)
invoke MyAlgo, eax, 123, 456
InVoke someApi123, 111h, eax, ADDR mytext, 123,456, addr MyVar,addr MyVar2 ; comment
invoke CreateWindowEx, WS_EX_CLIENTEDGE, Chr$("RichEdit20A"), NULL, reStyle, 0, 0, 1, 1, hWnd, ID_EDIT, wcx.hInstance, NULL
Post yours, too, please :thup:
This version
- Allows the following for all 4 arguments:
- Variable name
- ADDR variable name
- register
- Decimal, hex or binary # (can be negative)
- Numbers are preserved exactly as given by user
I guess that's all for this version. I'd like to point out that these changes actually
reduced the size of the code overall. (No more numeric conversion required, for one thing: numbers are handled as text.)
Some of the recent changes didn't require
any change to the code. Instead, all the changes were to data structures, which to me is the beauty of this technique: it's data driven. Yes, it's a bit more complicated than a nested mess of conditional code statements, but once you get it running, it's
so easy to expand it or make changes to the "grammar" of what you're parsing.
I hope someone tests this out and reports back to us.
Next version: remove the fixed requirement for 4 arguments, let it handle
n arguments (for some reasonable limit of
n, say 8 or 10).
Are there still special requirements for the invokes?
INVOKE--> code parser demo, version 2
Allows dec/hex/binary #, registers, var or ADDR var for all 4 args.
Enter statement to test: >invoke MyAlgo, eax, "a string", FP4(456.789)
Tokenization error.
Enter statement to test: >invoke MyAlgo, eax, 123, 456
Tokenization error.
Enter statement to test: >InVoke someApi123, 111h, eax, ADDR mytext, 123,456, addr MyVar,addr MyVar2 ; comment
Tokenization error.
Enter statement to test: >invoke CreateWindowEx, WS_EX_CLIENTEDGE, Chr$("RichEdit20A"), NULL, reStyle, 0, 0, 1, 1, hWnd, ID_EDIT, wcx.hInstance, NULL
Tokenization error.
JJ: Aaaaargh. You're pushing the boundaries to the breaking point.
Here's the rulez:
Format:
INVOKE function, arg1, arg2, arg3, arg4where all args can be any of the following:
- The name of a variable
- The address of a variable (ADDR varname)
- A register name (only 64-bit names allowed)
- A decimal, hex or binary number (incl. negatives)
All case-insensitive. 4 arguments, no more, no less. No floating-point stuff. NO STRINGS ALLOWED! This is just a demo (besides, how would a string make sense here?)
Curiously, it should accept
Chr$("RichEdit20A") as a valid identifier.
Now try it again.
:biggrin: a little obvious test.
Enter statement to test: >Enter statement to test: >INVOKE function, arg1, arg2, arg3, arg4
MOV R9, arg1
MOV R8, arg2
MOV RDX, arg3
MOV RCX, arg4
CALL function
That one worked, thanks Hector :thup:
Enter statement to test: >INVOKE function, arg1, arg2, arg3, arg4
MOV R9, arg1
MOV R8, arg2
MOV RDX, arg3
MOV RCX, arg4
CALL function
Enter statement to test: >INVOKE whatever, asdasd, 123, ecx
Tokenization error.
I just tested my version with invoke strings extracted from your source:
INVOKE WinMain, EAX
INVOKE ExitProcess, EAX
INVOKE StdOut, OFFSET ProgramHeading
INVOKE StdIn, OFFSET InputBuffer, SIZEOF InputBuffer
INVOKE StdOut, OFFSET CRLFstr
INVOKE wsprintf, ADDR buffer, OFFSET CALLfmt,
INVOKE StdOut, ADDR buffer
INVOKE strcmpi, OFFSET TextBuffer, [EBX].$T_entry.T_IDptr
INVOKE strcpy, ECX, OFFSET FnameStorage
INVOKE strcpy, ECX, OFFSET Var1Storage + 1
INVOKE strcpy, ECX, OFFSET Var2Storage + 1
INVOKE strcpy, ECX, OFFSET Var3Storage + 1
INVOKE strcpy, ECX, OFFSET Var4Storage + 1
And I found a little bug :sad:
It's fixed, see attached version 3.
Enter statement to test: >INVOKE function, RDX, RCX, RAX, RBX
Tokenization error.
Also X64 ABI is inverted in result.
Quote from: jj2007 on July 11, 2023, 06:21:49 AM
Enter statement to test: >INVOKE whatever, asdasd, 123, ecx
Tokenization error.
Y'see, that one failed for a reason:
it violated the grammar defined for the statement. (Which I gave above.) That's exactly what a parser is spozed to do.
Quote from: HSE on July 11, 2023, 06:24:01 AM
Enter statement to test: >INVOKE function, RDX, RCX, RAX, RBX
Tokenization error.
Also X64 ABI is inverted in result.
Aaaaargh; that one shoulda worked. Back to the lab.
Fixed. Updated code attached to previous reply up there.
Regarding the correct order of args and registers: I'll fix that in the next, generalized version.
Quote from: NoCforMe on July 11, 2023, 06:25:49 AM
Quote from: jj2007 on July 11, 2023, 06:21:49 AM
Enter statement to test: >INVOKE whatever, asdasd, 123, ecx
Tokenization error.
Y'see, that one failed for a reason: it violated the grammar defined for the statement. (Which I gave above.) That's exactly what a parser is spozed to do.
Quote from: NoCforMe on July 11, 2023, 06:12:09 AM
Here's the rulez:
Format: INVOKE function, arg1, arg2, arg3, arg4
where all args can be any of the following:
- The name of a variable
- The address of a variable (ADDR varname)
- A register name (only 64-bit names allowed)
- A decimal, hex or binary number (incl. negatives)
All case-insensitive. 4 arguments, no more, no less. No floating-point stuff. NO STRINGS ALLOWED!
INVOKE whatever, asdasd, 123, ecx failed, but
INVOKE whatever, asdasd, 123, rcx, 456 worked, congrats :thumbsup:
P.S.: Here is my output for the invoke strings extracted from your source, see version 3 above:
-------- Sample text: --------
INVOKE WinMain, EAX
mov rcx, EAX
call WinMain
-------- Sample text: --------
INVOKE ExitProcess, EAX
mov rcx, EAX
call ExitProcess
-------- Sample text: --------
INVOKE StdOut, OFFSET ProgramHeading
mov rcx, OFFSET ProgramHeading
call StdOut
-------- Sample text: --------
INVOKE StdIn, OFFSET InputBuffer, SIZEOF InputBuffer
mov rcx, OFFSET InputBuffer
mov rdx, SIZEOF InputBuffer
call StdIn
-------- Sample text: --------
INVOKE StdOut, OFFSET CRLFstr
mov rcx, OFFSET CRLFstr
call StdOut
-------- Sample text: --------
INVOKE wsprintf, ADDR buffer, OFFSET CALLfmt,
lea rcx, buffer
mov rdx, OFFSET CALLfmt
mov r8, ñm³v
call wsprintf
-------- Sample text: --------
INVOKE StdOut, ADDR buffer
lea rcx, buffer
call StdOut
-------- Sample text: --------
INVOKE strcmpi, OFFSET TextBuffer, [EBX].$T_entry.T_IDptr
mov rcx, OFFSET TextBuffer
mov rdx, [EBX].$T_entry.T_IDptr
call strcmpi
-------- Sample text: --------
INVOKE strcpy, ECX, OFFSET FnameStorage
mov rcx, ECX
mov rdx, OFFSET FnameStorage
call strcpy
-------- Sample text: --------
INVOKE strcpy, ECX, OFFSET Var1Storage + 1
mov rcx, ECX
mov rdx, OFFSET Var1Storage + 1
call strcpy
-------- Sample text: --------
INVOKE strcpy, ECX, OFFSET Var2Storage + 1
mov rcx, ECX
mov rdx, OFFSET Var2Storage + 1
call strcpy
-------- Sample text: --------
INVOKE strcpy, ECX, OFFSET Var3Storage + 1
mov rcx, ECX
mov rdx, OFFSET Var3Storage + 1
call strcpy
-------- Sample text: --------
INVOKE strcpy, ECX, OFFSET Var4Storage + 1
mov rcx, ECX
mov rdx, OFFSET Var4Storage + 1
call strcpy
I should fix the
mov rcx, EAX stuff :rolleyes:
So, a couple general questions here:
1. What's the purpose of this conversion, anyhow? I mean apart from being the basis of a parsing demo, which I'm all in on.
What's wrong with using the INVOKE macro as-is? Why would someone want to unroll the code this way?
2. My next version of the demo will generalize it, so it won't be limited to strictly 4 arguments as it is now. But after looking at the x64 calling convention, I think I'll stop here:
- Allow up to 8 arguments, the first 4 in registers, the last 4 pushed on the stack
- No floating-point stuff (and therefore no SSE registers ( xmm0-xmm3). That's an exercise for the reader, as they say.
I think that's reasonable; after all, this is a
parsing demo, not an exhaustive example of x64 usage. (Hell, I don't even use any 64-bit stuff myself!)
Anyhow, next (and probably last) version coming soon ...
Quote from: NoCforMe on July 11, 2023, 06:59:36 AM
basis of a parsing demo, which I'm all in on.
:thumbsup: Too much complexity is hard to follow. Perhaps demo have to reach difficulty just enough to be the third example in your essay, and no more.
Quote from: NoCforMe on July 11, 2023, 06:59:36 AMI don't even use any 64-bit stuff myself
Perhaps you haven't seen masm64 SDK invoke macro. That can be written a little more beauty:
invoke MACRO fname:REQ,args:VARARG
procedure_call fname,args
ENDM
procedure_call MACRO fname:REQ,a1:VARARG
LOCAL lead,wrd2,ssize,sreg,svar
arg1_n = 0
FOR arg2, <a1>
;; **************************
;; first 4 register arguments
;; **************************
IF arg1_n eq 0
REGISTER arg2,cl,cx,ecx,rcx,xmm0
ENDIF
IF arg1_n eq 1
REGISTER arg2,dl,dx,edx,rdx,xmm1
ENDIF
IF arg1_n eq 2
REGISTER arg2,r8b,r8w,r8d,r8,xmm2
ENDIF
IF arg1_n eq 3
REGISTER arg2,r9b,r9w,r9d,r9,xmm3
ENDIF
;; **************************
;; following stack arguments
;; **************************
IF arg1_n gt 3
STACKARG arg2,arg1_n*8
ENDIF
arg1_n = arg1_n + 1
ENDM
call fname
ENDM
Not so ugly spaghetti :biggrin:
Quote from: HSE on July 11, 2023, 07:24:01 AM
Quote from: NoCforMe on July 11, 2023, 06:59:36 AM
basis of a parsing demo, which I'm all in on.
:thumbsup: Too much complexity is hard to follow. Perhaps demo have to reach difficulty just enough to be the third example in your essay, and no more.
Well, I apologize for that. But the complexity here follows from the requirements of the demo, which are not trivial. My hope is that the underlying concepts--using a FSA for tokenization and a linked list for parsing--will somehow reveal themselves to the curious here despite the complexity. And hey, it's not
that complicated!.
Quote
Not so ugly spaghetti :biggrin:
Yes, but
that isn't a parser. Mine is.
Quote from: NoCforMe on July 11, 2023, 07:31:45 AM
Yes, but that isn't a parser. Mine is.
:thumbsup: Just kind of "lexical analysis" (Masm have the macro tokenizer)
Quote from: HSE on July 11, 2023, 07:24:01 AMPerhaps you haven't seen masm64 SDK invoke macro.
Perhaps you haven't seen the jinvoke macro - it's a factor 12 bigger but works with MASM, UAsm and AsmC, checks argument counts and types of arguments, just like the original 32-bit MASM invoke macro :biggrin:
jinvoke MACRO apiarg, args:VARARG
Local tmp$, tmpA$, api$, apx$, apinum, dllnum, info$, inf1$, c1$, is, isCrt, isXmm, oa
Local isR, curSlot, curApi, rev$, isO, isOl, ctArgs, rspExtra, cVarArg, pushReg
; tmp$ CATSTR <jinv &apiarg&, line >, %@Line, < with _jbInit=>, %jbInit, <, _jbPBI=>, %jbPBI, <, _jbPrologRun=>, %jbPrologRun
; % echo tmp$
ifdef needsnop
mov rbp, rbp
nops 4
endif
ife jbPrologRun
if @64
; nop
endif
endif
api$ CATSTR <apiarg>
isCrt INSTR api$, <crt_>
if isCrt
api$ SUBSTR api$, 5
endif
apx$ CATSTR <j@>, api$
; % echo ____ api$ -> apx$
is INSTR 1, apx$, </>
ife is
tmp$ CATSTR <LABEL >, <apiarg>
% echo ____ LABEL apiarg uses invoke apx$ ____
invoke apiarg, args
else
; % echo -------- Hello.... DefPrc$ in [api$] or [apiarg]
isR INSTR DefPrc$, api$ ; j#myalgo#s1441144
if isR eq 3
% echo -- Hello DP: [DefPrc$] and info: [info$]
; .err ; info$
apinum equ <-1>
isR INSTR isR+1, DefPrc$, <#>
dllnum equ <15000>
info$ SUBSTR DefPrc$, isR+1
info$ CATSTR info$, <xxxxxxxxxxxxxxxxxxxx>
else
apinum SUBSTR apx$, 1, is-1
isR INSTR is+1, apx$, <:>
dllnum SUBSTR apx$, is+1, isR-is-1
info$ SUBSTR apx$, isR+1
; % echo info: [info$] ; s1441...
info$ CATSTR info$, <xxxxxxxxxxxxxxxxxxxx>
endif
; tmp$ CATSTR <api >, <apiarg>, < has ID >, %apinum, < and DLL >, %dllnum, <, info=>, info$
; % echo tmp$
; define a new variable jdll@123
tmp$ CATSTR <jdll@>, dllnum
is = opattr tmp$
if is eq 36 ; immediate
curSlot=tmp$ ; already defined
else
curSlot=jaCtDll
if dllnum ge 0 ; <0 is own proc
tmp$ CATSTR tmp$, <=>, %jaCtDll
tmp$
tmp$ CATSTR <jd@>, %dllnum ; jd@3 equ advapi32
; % echo ## DLL: tmp$
tmp$ CATSTR <txDll>, %curSlot, < equ db ">, tmp$, <", 0>
; % echo tmp$
tmp$
jaCtDll=jaCtDll+1
endif
endif
; define a new variable j@123
tmp$ CATSTR <j@>, %apinum
is = opattr tmp$
if is eq 36
; echo ####### tmp$ already defined ###### ; immediate
curApi equ tmp$
else
is INSTR tmp$, <j@> ; jTypeChk follows below
if is
tmp$ CATSTR tmp$, <=>, %jaCtApi
% tmp$ ; the % is for ML64 (erratic errors)
if apinum gt 50000
if jbVerbose
% echo own api$
endif
curApi equ <-1>
else
tmp$ CATSTR <txApi>, %jaCtApi, < equ ap>, %jaCtApi, <$ db curSlot+1, ">, api$, <", 0>
if jbVerbose
% echo Win api$
endif
% tmp$ ; the % is for ML64 (erratic errors)
curApi=jaCtApi
jaCtApi=jaCtApi+1 ; this is total, not current
endif
else
curApi equ tmp$
endif
endif
ifidn <args>, <@>
call iaApi[SIZE_P*curApi]
EXITM
elseifidn <args>, <@def@>
EXITM
elseifidn <args>, <@address>
mov rax, iaApi[SIZE_P*curApi]
EXITM
endif
isR=0
cVarArg INSTR info$, <c6x>
rev$ equ <# >
for arg, <args> ; REVERSE
isR=isR+1
tmp$ CATSTR <arg>, < >
tmp$ SUBSTR tmp$, 1, 4
rev$ CATSTR <arg>, <#>, rev$
inf1$ SUBSTR info$, isR+1, 1
ifidn inf1$, <x>
ife cVarArg
; % echo info$/inf1$
tmp$ CATSTR <## line >, %@Line, <: too many arguments for &apiarg& ##>
% echo tmp$
.err
endif
endif
ifdifi tmp$, <addr>
if @InStr(1, <arg>, <&>) ne 1
if @InStr(1, <arg>, <*>) ne 1
if type arg eq REAL8
ifdif inf1$, <3> ; :s131
;.err <## REAL8 not expected ##>
endif
elseif type arg eq QWORD
ifidn inf1$, <3>
; % echo info$/inf1$
.err <## REAL8 expected ##>
endif
endif
endif
endif
endif
endm
is INSTR info$, <x>
if is gt isR+2
ife cVarArg
; tmp$ CATSTR <i=>, %is, <, r=>, %isR
; % echo tmp$
; % echo info$/inf1$
tmp$ CATSTR <## line >, %@Line, <: not enough arguments for &apiarg& ##>
% echo tmp$
.err
endif
endif
ctArgs=isR
rspExtra=0
if @64
if ctArgs GT 4
is=ctArgs mod 4
rspExtra=ctArgs/4
ife jbCompStyle
REPEAT 4-is
push rbx ; r8 is a 2-byter, so we take rbx
ENDM
endif
elseif ctArgs LT 4 ; can be merged
ife jbCompStyle
repeat 4-ctArgs ; 1...3 dummy pushes, rest pushed below
push rbx
endm
endif
endif
if isR GT jbArgsUsed+20 ; ?????
jbArgsUsed=isR+20
.err <isr gt argsused>
endif
endif
; tmp$ CATSTR <rev=>, rev$
; % echo tmp$
; if usedeb
; mov rsi, rsi ; for debugging
; int 3
; endif
; % echo API: api$, INFO: info$
; mov rsp, rsp ; ----- start moving args into stack ---------
While isR ; push in right order: rcx rdx r8 r9 pushed5 pushed6 etc
isR=isR-1
is INSTR rev$, <#>
tmp$ SUBSTR rev$, 1, is-1
c1$ SUBSTR rev$, 1, 1 ; only for & and *
tmpA$ CATSTR tmp$, < >
tmpA$ SUBSTR tmpA$, 1, 4
isOl=0 ; 0=no addr, offset, * or &
ifidni tmpA$, <offs>
isOl=7 ; substr must compensate offset characters
elseifidni tmpA$, <addr>
isOl=5 ; substr must compensate addr characters
elseifidn c1$, <&>
isOl=1
elseifidn c1$, <*>
isOl=1
endif
if @64
pushReg equ <r10>
pushRegD equ <r10d>
if isR eq 0
pushReg equ <rcx>
pushRegD equ <ecx>
elseif isR eq 1
pushReg equ <rdx>
pushRegD equ <edx>
elseif isR eq 2
pushReg equ <r8>
pushRegD equ <r8d>
elseif isR eq 3
pushReg equ <r9>
pushRegD equ <r9d>
endif
csDest equ [rsp+8*isR] ; [rbp+x] is same size in X64
if jbCompStyle ; always, it's the default now
c1$ SUBSTR info$, isR+2, 1
; if apinum gt 50000 and usedeb
; oa INSTR info$, <x>
; if oa
; tmpx$ SUBSTR info$, 1, oa
; else
; tmpx$ CATSTR info$
; endif
; oa = (opattr tmp$) AND 127
; tmpx$ CATSTR <Count=>, %isR, <: arg=[>, tmp$, <], c=>, c1$, < in >, tmpx$, <, o=>, %oa
; % echo tmpx$
; endif
noMem=4 ; and (useCB eq 0)
if isOl
tmp$ SUBSTR tmp$, 1+isOl
; if jbVerbose
; % echo off tmp$ ; not very useful
; endif
lea pushReg, tmp$ ; addr or offset
if isR ge noMem
mov csDest, pushReg
endif
elseif type(tmp$) eq REAL8 ; REAL8 to xmm
jTypeChk cVarArg, isR, c1$, <4REAL8>, api$
if isR ge noMem
movlps xmm0, tmp$ ; real8 to xmm0 (no conversion)
movlps qword ptr csDest, xmm0
else ; first 4 in xmm? and rcx rdx r8 r9
; % echo DEST: csDest
tmp2$ CATSTR <movlps xmm>, %isR, <,>, tmp$
; % echo **** Passing a REAL8: tmp2$
% tmp2$
if cVarArg ; Parameter passing: Floating-point values are only placed in the integer registers RCX, RDX, R8, and R9 when there are varargs arguments
tmp2$ CATSTR <movd >, pushReg, <, xmm>, %isR
; % echo **** Passing a REAL8 both to xmm? and reg64: tmp2$
% tmp2$
endif
endif
elseif type(tmp$) eq REAL4 ; REAL4 to xmm
jTypeChk cVarArg, isR, c1$, <3REAL4>, api$
if isR ge noMem
movd xmm0, tmp$ ; real4 to xmm0 (no conversion)
movd dword ptr csDest, xmm0
else ; first 4 in xmm? and rcx rdx r8 r9
; % echo DEST: csDest
tmp2$ CATSTR <movd xmm>, %isR, <,>, tmp$
; % echo **** Passing a REAL8: tmp2$
% tmp2$
if cVarArg ; Parameter passing: Floating-point values are only placed in the integer registers RCX, RDX, R8, and R9 when there are varargs arguments
tmp2$ CATSTR <movd >, pushReg, <, xmm>, %isR
; % echo **** Passing a REAL8 both to xmm? and reg64: tmp2$
% tmp2$
endif
endif
elseif type(tmp$) LT SIZE_P ; zero-extend
; % echo xx tmp$ xx less than size_p
jTypeChk cVarArg, isR, c1$, <1DWORD>, api$ ; let's check if the callee wants something else
oa = opattr tmp$
if oa eq atImmediate
if isR lt noMem ; use registers
ife tmp$
xor pushRegD, pushRegD ; shortest option?
else
if tmp$ eq -1
xor pushRegD, pushRegD
dec pushReg
elseif tmp$ LT 0
mov pushReg, tmp$
else
mov pushRegD, tmp$
endif
endif
; no mov csDest, pushReg
else ; move immediate into stack
ife tmp$
and qword ptr csDest, 0 ; shortest option; dword is ok for regs but not mem
else
if tmp$ eq -1
or qword ptr csDest, -1
else
mov qword ptr csDest, tmp$
endif
endif
endif
elseif oa eq atRegister
mov csDest, tmp$
else
if type tmp$ LT DWORD
movsx pushRegD, tmp$
else
mov pushRegD, tmp$
endif
if isR ge noMem
mov csDest, pushReg
endif
endif
else ; SIZE_P (s-code 1)
isXmm INSTR tmp$, <xmm>
if isXmm eq 1
jTypeChk cVarArg, isR, c1$, <4REAL8>, api$ ; TypeCheck: 4, 3REAL4, c1=[3]
; % echo A: movlps csDest, tmp$
movlps QWORD ptr csDest, tmp$
if isR lt noMem
if cVarArg
; % echo B: mov pushReg, tmp$
movd pushReg, tmp$
endif
endif
else
jTypeChk cVarArg, isR, c1$, <1DWORD>, api$ ; TypeCheck: 4, 3REAL4, c1=[3]
ifdifi pushReg, tmp$
if isR ge noMem
oa = (opattr tmp$) AND 127
if oa eq atRegister
mov csDest, tmp$
else
mov pushReg, tmp$
mov csDest, pushReg
endif
else
mov pushReg, tmp$
endif
else
if isR ge noMem ; otherwise fastcall
mov csDest, tmp$ ; same as pushReg
endif
endif
endif
endif
else ; vvv 32-bit code vvv
if isOl
tmp$ SUBSTR tmp$, 1+isOl
lea pushReg, tmp$ ; addr or offset
mPush pushReg ; 32-bit code
elseif type tmp$ eq REAL4
mov pushRegD, tmp$ ; use 32-bit instruction
mPush pushReg
tmp$ CATSTR <** Warning, line >, %@Line, <: passing a REAL4 may not work **>
% echo tmp$
elseif type tmp$ LT SIZE_P ; zero-extend
oa = opattr tmp$
if oa eq atImmediate
ife tmp$
xor pushRegD, pushRegD
push pushReg
else
push tmp$
if isR LT 4
mov pushReg, [rsp]
endif
endif
else
if type tmp$ LT DWORD
movsx pushRegD, tmp$
else
mov pushRegD, tmp$
endif
mPush pushReg
endif
else
ifdifi pushReg, tmp$
mov pushReg, tmp$
endif
mPush pushReg
endif
endif
else ; v v 32-bit code v v
if isOl ; <addr>
if isOl eq 7
mPush tmp$
else
tmp$ SUBSTR tmp$, 1+isOl
oa = (opattr tmp$) AND 127
if oa eq atGlobal
push offset tmp$
else
lea edx, tmp$
mPush edx
endif
endif
elseif (type tmp$ eq REAL8) or (type tmp$ eq QWORD) ; see add rsp
mPush dword ptr tmp$[4]
mPush dword ptr tmp$
rspExtra=rspExtra+4
else
isXmm INSTR tmp$, <xmm>
ife isXmm
mPush tmp$
else
push eax
endif
endif
endif
; % echo ---- rev$ -------
rev$ SUBSTR rev$, is+1
ENDM
; mov rbp, rbp ; ----- end moving args ---------
if @64
; tmp$ CATSTR <apiarg>, < has >, %ctArgs, < paras>
; % echo tmp$
if 0
mPush arg6 ; sixth parameter ; mazegen
mPush arg5 ; fifth parameter
sub rsp, 4*8 ; allocate space for 'Register Parameter Stack Area'
mov r9, arg4
mov r8, arg3
mov rdx, arg2
mov rcx, arg1
call function ; inactive
add rsp, 4*8 + 2*8 ; release all parameters from stack
endif
endif
ifidn api$, <ExitProcess>
j@ExDone=1
endif
if jbStrings
tmp$ CATSTR <CALL >, api$, < as >, %(curApi), </>, %jaCtApi
% echo tmp$
endif
if usedeb and apinum lt 50000 ; --- solved for x64 with syms and VS14, see bax ---
ife @64
tmp2$ CATSTR <mov edx, Chr$(">, api$, <")>
; % echo api$: tmp2$
% tmp2$
endif
endif
if @64X
; sub rsp, 4*8
rspExtra=rspExtra+1
endif
if apinum gt 50000
call api$ ;; user proc
else
call iaApi[SIZE_P*curApi]
endif
is INSTR info$, <c>
if (is eq 1) and (@64 eq 0)
add rsp, 4*ctArgs+rspExtra ; 32-bit C stack correction; rspex for QWORD
endif
if rspExtra and jbCompStyle eq 0
add rsp, rspExtra*32
endif
endif
ENDM
Quote from: jj2007 on July 11, 2023, 07:51:14 AM
Perhaps you haven't seen the jinvoke macro
Yes. That is an ugly spaghetti :biggrin: :biggrin:
Quote from: HSE on July 11, 2023, 07:54:35 AM
Quote from: jj2007 on July 11, 2023, 07:51:14 AM
Perhaps you haven't seen the jinvoke macro
Yes. That is an ugly spaghetti :biggrin: :biggrin:
I knew you would like it :greensml:
Look how it translates the CreateWindowEx into real, efficient code:
int 3
jinvoke CreateWindowEx, WS_EX_CLIENTEDGE, Chr$("RichEdit20A"), NULL, reStyle, 0, 0, 1, 1, hWnd, ID_EDIT, wcx.hInstance, NULL
nop
int3 |
and [rsp+58],0 | NULL
mov r10,[140003054] | wcx
mov [rsp+50],r10 | wcx
mov [rsp+48],6F | ID_EDIT
mov r10d,[rbp+10] | hWnd
mov [rsp+40],r10 | hWnd
mov [rsp+38],1 | 1
mov [rsp+30],1 | 1
and [rsp+28],0 | 0
and [rsp+20],0 | 0
mov r9d,503001C4 | reStyle
xor r8d,r8d | NULL
lea rdx,[1400030B7] | Chr$("RichEdit20A")
mov ecx,200 | WS_EX
call [<&CreateWindowExA>] |
nop |
P.S.: I fixed the
mov rax, eax bug (version 4 attached):
-------- Sample text: --------
invoke CreateWindowEx, WS_EX_CLIENTEDGE, Chr$("RichEdit20A"), NULL, reStyle, 0, 0, 1, 1, hWnd, ID_EDIT, wcx.hInstance, NULL
mov rcx, WS_EX_CLIENTEDGE
mov rdx, Chr$("RichEdit20A")
mov r8, NULL
mov r9, reStyle
push 0
push 0
push 1
push 1
push hWnd
push ID_EDIT
push wcx.hInstance
push NULL
call CreateWindowEx
-------- Sample text: --------
INVOKE WinMain, EAX
mov ecx, EAX
call WinMain
-------- Sample text: --------
INVOKE ExitProcess, EAX
mov ecx, EAX
call ExitProcess
Of course, this is still a push orgy, so it's not real code as shown above.
Quote from: NoCforMe on July 11, 2023, 07:31:45 AM
And hey, it's not that complicated!.
:biggrin: So far (I'm in second example), for this simple things, spaghetti is more easy.
But better than to modify an spaghetti is to begin from zero. Then we have the chance to understand how these table driven FSM can be build (for more complex cases). :thumbsup:
Here's the latest version. Takes from 1 to 8 arguments, places the first 4 in registers, pushes any others on the stack.
Try it out.
I think this is as far as I go with this demo; it has met (and exceeded) the challenge by mineiro.
It'd be nice to get some feedback on this. I'm thinking of making an evaluation form, with questions like this:
A. What is your overall opinion of this demo?
- I think it's great and can't wait to implement it.
- It's interesting, but maybe some other day.
- Not sure about this.
- You'd have to pay me to even think about using this!
- I'd never use this even if you paid me!
Ha ha, just kidding. But seriously, give me some feedback here. Like I said, this isn't for everyone, but I think it demonstrates an important and very useful technique in text analysis.
It may seem complex, but believe me, after doing two or three of these, it's very easy to start a new parser from scratch. It's like riding a bicycle; the first few time are hard, but it becomes second nature after that. A lot of stuff can be block-copied to save coding time. And you can build very extensive parsers with this method. Just to show you, here's a command file for a graph-making program I did a long time ago that uses my parsing methods:
;===============================================
; Sample MAKEGRAF control file (test.gcf)
;===============================================
;text (location=(400, 60) text="Index #16" color=16)
;text (location=(20, 180) text="Index #1" color=1)
;text (location=(20, 140) text="Index #2" color=2)
;text (location=(20, 100) text="Index #3" color=3)
;text (location=(20, 60) text="Index #4" color=4)
;text (location=(100, 180) text="Index #5" color=5)
;text (location=(100, 140) text="Index #6" color=6)
;text (location=(100, 100) text="Index #7" color=7)
;text (location=(100, 60) text="Index #8" color=8)
;text (location=(200, 180) text="Index #9" color=9)
;text (location=(200, 140) text="Index #10" color=10)
;text (location=(200, 100) text="Index #11" color=11)
;text (location=(200, 60) text="Index #12" color=12)
;text (location=(400, 180) text="Index #13" color=13)
;text (location=(400, 140) text="Index #14" color=14)
;text (location=(400, 100) text="Index #15" color=15)
;text (location=(500, 60) text="!" color=16)
palette (load "test.gpf"
13(255,0,0) ;define red as RED!
)
graph (
size=(640,400)
filename="test.bmp"
bgcolor=11
)
grid (llcorner=(80, 60)
gridcolor=16
axisthickness=2
width=500
height=300
bgcolor=7)
line(start=(100,75)end=(250,120))
line(start=(250,120)end=(310,270))
line(start=(310,270)end=(450,120))
line(start=(450,120)end=(460,300))
font="5x9.sff"
font="7x11.sff"
; This statement shows a bug: horizontal rotated text doesn't render properly:
;text(location=(400,200) text="Weird; rotated horizontal text" rotation=TRUE)
text (location=(200, 390) text = "Civilians Killed in Iraq (M)" color=1)
text (location=(40, 380) text="3!#$&%*/(0123456789)" color=6 font="5x9.sff")
text (location=(40, 50)
text="!#$&%*/(0123456789):;@<=>ABCDEFGHIJKLMNOPQRSTUVWXYZ?[\\]^`ab"
color=2 font="7x11.sff")
text (location=(40, 30)
text=".,'\"cdefghijklmnopqrstuvwxyz{|}~"
color=2 font="7x11.sff")
text(location=(50,350)text="0123456789" color=13 font="7x11.sff" direction=vert)
text(location=(50,100)text="ROTATED TEXT" color=14 font="7x11.sff" direction=vert rotation=true)
dot (location=(100,76) color=13)
dot (location=(250,121) color=13)
dot (location=(310,270) color=13 shape=square)
Quote from: NoCforMe on July 11, 2023, 03:14:15 PM
It'd be nice to get some feedback on this. I'm thinking of making an evaluation form, with questions like this:
A. What is your overall opinion of this demo?
- I think it's great and can't wait to implement it.
- It's interesting, but maybe some other day.
- Not sure about this.
- You'd have to pay me to even think about using this!
- I'd never use this even if you paid me!
6. You are almost there :thumbsup:
Using invoke lines from your latest source:
INVOKE--> code parser demo, version 4
Allows dec/hex/binary #, registers, var or ADDR var for
up to 8 arguments (requires at least 1).
Enter statement to test: >INVOKE WinMain, EAX
MOV RCX, EAX ; <<<<<<<<<<<<<< error
CALL WinMain
Enter statement to test: >INVOKE ExitProcess, EAX
MOV RCX, EAX
CALL ExitProcess
Enter statement to test: >INVOKE StdOut, OFFSET ProgramHeading
Tokenization error.
Enter statement to test: >INVOKE StdIn, OFFSET InputBuffer, SIZEOF InputBuffer
Tokenization error.
Enter statement to test: >INVOKE StdOut, OFFSET CRLFstr
Tokenization error.
Enter statement to test: >INVOKE wsprintf, ADDR buffer, OFFSET CALLfmt
Tokenization error.
Enter statement to test: >INVOKE strcmpi, OFFSET TextBuffer, [EBX].$T_entry.T_IDptr
For testing, it might be easier to use a text file with examples, like the attached one, instead of typing all the time.
Quote from: jj2007 on July 11, 2023, 06:11:32 PM
Enter statement to test: >INVOKE WinMain, EAX
MOV RCX, EAX ; <<<<<<<<<<<<<< error
CALL WinMain
No error, that result is correct! For this FSM, EAX is a variable name. :biggrin:
Thanks, Héctor. People keep throwing stuff at my poor li'l parser thinking it knows the entire universe of MASM symbols. It doesn't, just a limited subset of them. Think of how complex it would have to be in order to handle expressions like
[RDX].Table + 12
[RAX + RBX + Table]
[RAX+RBX+Table]
Ironically, my parser can handle that last one, since there are no embedded spaces; it's just another "unknown identifier" to it:
Enter statement to test: >invoke function, [RDX+RAX+Table]
MOV RCX, [RDX+RAX+Table]
CALL function
But it has no idea what all those particles within it mean.
So have I met the original challenge (mineiro's)?
I am now downloading your program.
I intend to play with your toy during this week, if I made some changes I will post them in this topic with your permission.
Thank you sir NoCforMe,.
Quote from: NoCforMe on July 12, 2023, 04:41:47 AM
Thanks, Héctor. People keep throwing stuff at my poor li'l parser
Invoke someproc, [RDX].Table + 12, [RAX + RBX + Table], [RAX+RBX+Table]
mov rcx, [RDX].Table + 12
mov rdx, [RAX + RBX + Table]
mov r8, [RAX+RBX+Table]
call someproc
That'll make it through my demo if you remove the spaces:
Enter statement to test: >Invoke someproc, [RDX].Table + 12, [RAX + RBX + Table], [RAX+RBX+Table]
Tokenization error.
Enter statement to test: >Invoke someproc, [RDX].Table+12, [RAX+RBX+Table], [RAX+RBX+Table]
MOV RCX, [RDX].Table+12
MOV RDX, [RAX+RBX+Table]
MOV R8, [RAX+RBX+Table]
CALL someproc
No way to fix that spaces problem? After all, the token delimiter is clearly the comma...
Quote from: jj2007 on July 12, 2023, 07:37:20 AM
No way to fix that spaces problem? After all, the token delimiter is clearly the comma...
JJ, you really don't seem to understand what's going on here. Yes, I could "fix the spaces problem" by only recognizing the comma as the delimiter. But first the more trivial problem: that would mean that spaces would be included in
any identifier, like, say, "RAX " or "varName " where the user put a space between the ID and the comma(which of course is allowed in MASM syntax). Which would mess up the formatting. (Would probably still produce valid assemble-able code, but still.)
But the more important problem is that the parser still wouldn't understand at all what the component parts of the expression are, and which sequences of them are legal and which are not. Which you can see is a non-trivial problem, one which is waaaaay beyond the scope of what was spozed to be a somewhat simple demo.
Later: I tried what you suggested, which was to allow a space to be part of an identifier--super-easy change, just change one of the jump targets in the tokenization table--but that broke the whole thing. Scratched my head for a bit, why did that happen? Wellll, because a space
is a delimiter, between "invoke" and the function name. So that won't work.
The only proper way to do it would be to handle the universe of address expressions, which is enormous. Not gonna happen for this demo.
Quote from: NoCforMe on July 12, 2023, 07:53:48 AM
JJ, you really don't seem to understand what's going on here. Yes, I could "fix the spaces problem" by only recognizing the comma as the delimiter. But first the more trivial problem: that would mean that spaces would be included in any identifier, like, say, "RAX " or "varName " where the user put a space between the ID and the comma
Shouldn't be too hard to do a little 'preprocessing'. I have a qe plugin (fixpunc) that removes any space(s) before a comma and places a single space after the comma & removes any extraneous spaces after the comma (if more than 1). Not that you need a qe plugin, but the algo is very simple... :icon_idea: src is the source buffer, dst is the destination buffer...fixpunc proto :dword, :dword
.code
fixpunc proc src:dword, dst:dword
mov ecx, src
mov edx, dst
top:
mov al, [ecx]
cmp al, 0
jz done
cmp al, ","
jz comma1
mov [edx], al
inc ecx
inc edx
jmp top
comma1:
cmp byte ptr [edx-1], 20h
jnz @f
dec edx
jmp comma1
@@:
mov [edx], al
inc edx
@@:
inc ecx
cmp byte ptr [ecx], 20h
jnz movcomm
mov al, [ecx]
mov [edx], al
inc edx
@@:
inc ecx
cmp byte ptr [ecx], 20h
jz @B
jmp top
movcomm:
mov byte ptr [edx], 20h
inc edx
jmp top
done:
ret
fixpunc endp
Quote from: NoCforMe on July 12, 2023, 07:53:48 AMNot gonna happen for this demo.
:thumbsup:
For a further step, can't be a big deal. Just requiere another state. You can read space and comma at least in 2 different states... but I'm still in example 2 :biggrin:
No, it would be kind of a big deal. Here's the thing: the tokenizer in the demo (not the parser) looks for all non-numeric chunks of text as "identifiers". This includes
- variable names
- register names
- the tokens "INVOKE" and "ADDR"
none of which contain spaces.
To allow constructs like
[RAX + RDX].Table + 2, the tokenizer would have to be expanded to cover expressions within square brackets and arithmetic expressions. Plus the parser would have to be able to follow these sequences. You can see that this is definitely a non-trivial process.
The tokens for that expression would be
- left square bracket ('[')
- register (RAX)
- plus sign ('+')
- register (RDX)
- right square bracket (']')
- period ('.')
- ID ('Table')
- number ('2')
Before: buffera db "Invoke someproc, [RDX ].Table + 12, [RAX + RBX + Table], [RAX+ RBX +Table ]",0,0,0,0
Result: buffera db "Invoke someproc,[RDX].Table+12,[RAX+RBX+Table],[RAX+RBX+Table]",0,0,0,0 :tongue:
.data
TableDividers db ",", ".", "+", "-", "*","/","(",")","[","]",0
buffera db "Invoke someproc, [RDX ].Table + 12, [RAX + RBX + Table], [RAX+ RBX +Table ]",0,0,0,0,0,0
.code
Parser proc
LOCAL SaveRCX :QWORD,SaveRDX: QWORD, SaveRBX: QWORDRD
mov SaveRCX,rcx
mov SaveRDX,rdx
mov SaveRBX,rbx
lea rax,buffera ; source
mov rbx, rax ; dest
@@:
mov cl, byte ptr[rax]
add rax,1
test cl,cl
je ende
cmp cl,20h
jne @laba3
cmp byte ptr [rax],20h
lea rax,[rax+1]
je @b ; Skip more spaces
mov ch,byte ptr [rax-3]
sub rax,1
lea rdx, TableDividers
@laba1:
cmp ch,byte ptr [rdx]
je @b
cmp byte ptr [rdx],0 ;not found
lea rdx,[rdx+1]
jne @laba1
mov ch, byte ptr [rax]
lea rdx, TableDividers
@laba2:
cmp ch,byte ptr [rdx]
je @b
cmp byte ptr [rdx],0 ;not found
lea rdx,[rdx+1]
jne @laba2
@laba3:
mov byte ptr [rbx],cl
add rbx,1
jmp @b
ende:
mov dword ptr [rbx],0
lea rax,buffera
mov rcx, SaveRCX
mov rdx, SaveRDX
mov rbx, SaveRBX
ret
Parser endp
:tongue:
OK, well, maybe ... I can see it does work, but it's basically cheating, and isn't that only designed for that particular address expression?
Also, suggestion: some comments would help. I can't figure out from glancing at it just what the hell your code does.
But interesting. A+ for cleverness. (Oh, and I really like your animated avatar. Cuuute.)
Getting back to my demo, for anyone who's interested in the guts of the thing, this is the data structure that drives the whole parsing process (after tokenization):
;***** The Parsing Sequence *****
ParsingSequence LABEL $Pnode
_pn0 $Pnode <_pn1, $T_INVOKE, NULL>
DD -1
_pn1 $Pnode <_pn2, $T_ID, StoreFname>
DD -1
_pn2 $Pnode <_pn3, $T_comma, NULL>
DD -1
_pn3 $Pnode <_pn5, $T_ID, StoreArg>
$Pnode <_pn5, $T_number, StoreArg>
$Pnode <_pn5, $T_register, StoreArg>
$Pnode <_pn4, $T_ADDR, TagArgAsADDR>
DD -1
_pn4 $Pnode <_pn5, $T_ID, StoreArg>
DD -1
_pn5 $Pnode <_pn3, $T_comma, NULL>
$Pnode <NULL, $T_EOL, NULL>
DD -1
That's all, a linked list of $Pnode structures. You can follow it through:
1. At node _pn0, see the token INVOKE, go to _pn1.
2. At node _pn1, see the token ID (function name), call StoreFname(), go to _pn2.
3. At node _pn2, see a comma, go to _pn3.
4. At node _pn3, see the token ID, call StoreArg(), go to _pn5
see the token number, call StoreArg(), go to _pn5
see the token register, call StoreArg(), go to _pn5
see the token ADDR, call TagArgAsADDR(), go to _pn4.
5. At node _pn4, see the token ID, call StoreArg(), go to _pn5.
6. At node _pn5, see a comma, go back to _pn3.
see the token EOL (end o'line), STOP (parser sees $T_EOL and ends processing).
So assuming your tokenizer gives you all the tokens your text contains, you only need to expand on this structure to do all kinds of parsing tasks (with some small stub subroutines to go along with it). That's the beauty of this method (if I don't say so myself).
Quote from: HSE on July 12, 2023, 02:02:53 PM
but I'm still in example 2 :biggrin:
Ok! Second essay example is
working, in a not very impressive way :biggrin::
$parseSuccess
Press any key to continue ...
Just the skeleton to see how parser work in a debugger.
Because the code is already in the text, I added a little challenge: to make a
Neutral Bitness Code (Friedrich
et al. syntax).
Then you can build same code with ML or ML64, using MASM32 SDK or MASM64 SDK, resulting obviously a 32 or 64 bits binary file.
Probably Hutch would have found it funny that macros developed for 64-bits are also used in 32-bits :smiley:
Wow. I'm impressed. Also honored that you took the time to actually work through that example. I take that as some kind of compliment.
So after writing this, what do you think? Was it worthwhile? Do you think you might ever actually use this for a parsing task?
It'd be cool if you did, and to see what modifications you make (besides making the code 64-bit friendly).
:thumbsup:
Quote from: NoCforMe on July 16, 2023, 09:19:18 AM
So after writing this
It's just a first step to see how that work :biggrin:
Quote from: HSE on July 16, 2023, 08:58:45 AMProbably Hutch would have found it funny that macros developed for 64-bits are also used in 32-bits :smiley:
Yeah, funny (https://masm32.com/board/index.php?topic=10958.0), isn't it :mrgreen: