Command line parsing algorithms.

hutch-- · March 10, 2019, 11:07:21 AM

These are a set of command line parsing algorithms that are designed to be simple to use. There is an algorithm that counts arguments passed on the command line that use the ";" character as the delimiter and 5 versions depending on the arg count. Later there will be a couple of array based versions, one using a delimiter and the other that uses a space based delimiter with quotes support for arguments that have embedded spaces.

They are testing up well at the moment and when finished testing, they are library candidates.

sinsi · March 10, 2019, 03:55:19 PM

A few questions about arg_cnt

Code Select

    mov r10, rcx
...
    mov r11, rcx

Why not use RCX to begin with?

Unless there are more than 4 billion args, isn't it smaller and faster to use EAX (or even AL) instead of RAX?

Nitpick: rdx = delimiter, really DL = delimiter since RDX isn't actually used.

What about unicode?

hutch-- · March 10, 2019, 04:46:30 PM

The calling convention will use RDX as the second arg anyway so its simple enough to use DL in the loop code.

> isn't it smaller and faster to use EAX (or even AL) instead of RAX?

It may be smaller but not faster, right from the beginning with 64 bit code I have used native sized 64 bit registers where possible to avoid mixed size code as I only build /LARGEADDRESSAWARE code. You are right that RCX could have been used and it would drop 1 instruction but its not in the loop code so it not like it matters much. These are candidates for the library so there is some room to tweak them a little further.

It is as much habit that I start with the high registers first from r11 down, it was to shift from the old habits in 32 bit where you had far fewer registers to work with.

TimoVJL · March 10, 2019, 09:16:45 PM

@Steve H
/LARGEADDRESSAWARE is the default for x64 for linkers and the only effective option for x64 is /LARGEADDRESSAWARE:NO

hutch-- · March 10, 2019, 10:38:25 PM

I have heard this but MASM is not a Microsoft C compiler, usually people set the NO option when they try to use a mnemonic that is not allowed in x64. The solution is to learn the correct codings that work in x64.

hutch-- · March 10, 2019, 10:40:14 PM

sinsi,

Here is the next version, its a bit smaller and cleaner but its not like it matters much.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

include \masm32\include64\masm64rt.inc

.code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

LOCAL sPtr :QWORD
LOCAL acnt :QWORD

sas sPtr," 0ne ; two ; three ; four ; five;six ; seven ; eight"
mov acnt, rvcall(arg_cnt,sPtr)
conout "Argument count = ",str$(acnt),lf

waitkey
.exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

arg_cnt proc
; -------------------------------
; counts ";" delimiter and adds 1
; rcx = string address
; -------------------------------
cmp BYTE PTR [rcx], 0 ; exit with error on null string
je error

xor rax, rax ; set rax to 0
sub rcx, 1
@@:
add rcx, 1
cmp BYTE PTR [rcx], 0 ; test for terminator
je @F
cmp BYTE PTR [rcx], 59 ; test if delimiter
jne @B
add rax, 1 ; increment the arg count
jmp @B

@@:
add rax, 1 ; return delimiter count + 1
ret

error:
xor rax, rax ; exit on empty string with rax = 0
ret

arg_cnt endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

sinsi · March 10, 2019, 11:18:40 PM

>but its not like it matters much
Every little bit helps...

jj2007 · March 10, 2019, 11:37:18 PM

Yeah... but x64 instructions are notoriously longer. An inc rcx is just one byte shorter than add rcx, 1. So at max the routine could be only 5 bytes shorter, including a dirty hack like cmp BYTE PTR [rcx], ah ; test for terminator ;)

Here is the extra short version, 28 bytes:

Code Select

arg_cnt:
  ; -------------------------------
  ; counts ";" delimiter and adds 1
  ; rcx = string address
  ; -------------------------------
    xor eax, eax           	; set rax to 0
    cmp BYTE PTR [rcx], al       ; exit with error on null string
    je error

    dec rcx
  @@:
    inc rcx
    cmp BYTE PTR [rcx], ah       ; test for terminator
    je @F
    cmp BYTE PTR [rcx], 59      ; test if delimiter
    jne @B
    inc eax                  ; increment the arg count
    jmp @B

  @@:
    inc eax                  ; return delimiter count + 1
  error:
    ret 0

hutch-- · March 10, 2019, 11:48:23 PM

I don't particularly lose any sleep over the instruction length because 64 bit processors run 64 bit instructions natively. Yes you can often use a 32 bit register but the processor still reads the 64 bit register so the only gain was trying to port 32 bit legacy code to 64 bit.

jj2007 · March 11, 2019, 12:03:56 AM

Well, it's 41->28 bytes. Speedwise it may become a problem if you frequently use the function in loops with a Million iterations. Btw with optimisations on, 64-bit compilers use these tricks with 32-bit registers.

sinsi · March 11, 2019, 12:15:16 AM

I was thinking more along the lines of extra bytes adding up over all procs, not calling one longer proc a million times.
If you are using RAX when a 32-bit number is all you ever use, it doubles the instruction length (I would imagine speed stays the same).
Over a few thousand lines of code it could add up to a few cache misses etc.

Code Select

cmp BYTE PTR [rcx], ah
Too bad if there are more than 255 args (yes I have seen this once)

TimoVJL · March 11, 2019, 03:22:45 AM

Quote from: hutch-- on March 10, 2019, 10:38:25 PM
I have heard this but MASM is not a Microsoft C compiler, usually people set the NO option when they try to use a mnemonic that is not allowed in x64. The solution is to learn the correct codings that work in x64.

Not a cl issue, as the command line option /LARGEADDRESSAWARE alone is quite useless with x64, no effect at all, as it is a default.
Just use a PEView or similar to check IMAGE_FILE_LARGE_ADDRESS_AWARE bit after linking.

hutch-- · March 11, 2019, 04:52:26 AM

From long ago I have kept hearing the assumption that shorter code in terms of byte length is supposed to be faster but the clock does not agree with that assumption as it was based off the pre-i486 hardware in 16 bit real mode. It vaguely mattered in MS-DOS COM files but with the i486 came pipelines and the beginning of instruction scheduling and instructions are still read complete, not in part. An instruction muncher does not care about the instruction size as long as the processor and OS version are capable of reading that instruction. I am still fascinated why MS-DOS assumptions linger on in 32 and 64 bit coding practices.

Whatever the perceived advantages may happen to be with any given algo design, look at the alignment padding at the end of the procedure and you have thrown most of it away most of the time. If an algorithm is so long that it is effected by cache problems, redesign it and the ultimate test is as usual, the clock, the rest does not matter.

RE : C compiler optimisation.
It was not that long ago that optimising C compilers loaded an immediate into a register before performing an ADD, as before the clock is the one that matters.

Timo, this is what Microsoft have to say about /LARGEADDRESSAWARE

Quote
The /LARGEADDRESSAWARE option tells the linker that the application can handle addresses larger than 2 gigabytes. In the 64-bit compilers, this option is enabled by default. In the 32-bit compilers, /LARGEADDRESSAWARE:NO is enabled if /LARGEADDRESSAWARE is not otherwise specified on the linker line.

If the only gain is in typing a shorter link line, I don't have a problem with it as I auto generate the batch files I use to build 64 bit MASM binaries.

daydreamer · March 12, 2019, 04:23:21 AM

I also like that my cpu is a 3.5ghz+ number cruncher and my favourite kind of instruction sets are the biggest ones,but I believe the newest cpus are better designed to handle large opcodes+largedata size and faster calculate numbers

and /LARGEADRESSAWARE, I really like to see a masm64 or masm32 program to take advantage of that kind of memorysize allocation,not here but probably in game forum maybe reading in compressed data and store uncompressed

jj2007 · March 12, 2019, 06:44:10 AM

The issue is not the length of an instruction. If it's in the cache, it doesn't matter if it's a one-byter like inc eax or lodsb. But if a loop doesn't fit into the instruction cache, the cpu must reload the instructions, and that costs obviously cycles. Therefore shorter instructions allow more complex loops.

https://stackoverflow.com/questions/22921373/how-to-write-instruction-cache-friendly-program-in-c

The MASM Forum

News:

Command line parsing algorithms.

hutch--

sinsi

hutch--

TimoVJL

hutch--

hutch--

sinsi

jj2007

hutch--

jj2007

sinsi

TimoVJL

hutch--

daydreamer

jj2007