The MASM Forum

Microsoft 64 bit MASM => MASM64 SDK => Topic started by: hutch-- on March 10, 2019, 11:07:21 AM

Title: Command line parsing algorithms.
Post by: hutch-- on March 10, 2019, 11:07:21 AM
These are a set of command line parsing algorithms that are designed to be simple to use. There is an algorithm that counts arguments passed on the command line that use the ";" character as the delimiter and 5 versions depending on the arg count. Later there will be a couple of array based versions, one using a delimiter and the other that uses a space based delimiter with quotes support for arguments that have embedded spaces.

They are testing up well at the moment and when finished testing, they are library candidates.
Title: Re: Command line parsing algorithms.
Post by: sinsi on March 10, 2019, 03:55:19 PM
A few questions about arg_cnt

    mov r10, rcx
...
    mov r11, rcx

Why not use RCX to begin with?

Unless there are more than 4 billion args, isn't it smaller and faster to use EAX (or even AL) instead of RAX?

Nitpick: rdx = delimiter, really DL = delimiter since RDX isn't actually used.

What about unicode?
Title: Re: Command line parsing algorithms.
Post by: hutch-- on March 10, 2019, 04:46:30 PM
The calling convention will use RDX as the second arg anyway so its simple enough to use DL in the loop code.

>  isn't it smaller and faster to use EAX (or even AL) instead of RAX?

It may be smaller but not faster, right from the beginning with 64 bit code I have used native sized 64 bit registers where possible to avoid mixed size code as I only build /LARGEADDRESSAWARE code. You are right that RCX could have been used and it would drop 1 instruction but its not in the loop code so it not like it matters much. These are candidates for the library so there is some room to tweak them a little further.

It is as much habit that I start with the high registers first from r11 down, it was to shift from the old habits in 32 bit where you had far fewer registers to work with.
Title: Re: Command line parsing algorithms.
Post by: TimoVJL on March 10, 2019, 09:16:45 PM
@Steve H
/LARGEADDRESSAWARE is the default for x64 for linkers and the only effective option for x64 is /LARGEADDRESSAWARE:NO
Title: Re: Command line parsing algorithms.
Post by: hutch-- on March 10, 2019, 10:38:25 PM
I have heard this but MASM is not a Microsoft C compiler, usually people set the NO option when they try to use a mnemonic that is not allowed in x64. The solution is to learn the correct codings that work in x64.
Title: Re: Command line parsing algorithms.
Post by: hutch-- on March 10, 2019, 10:40:14 PM
sinsi,

Here is the next version, its a bit smaller and cleaner but its not like it matters much.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    LOCAL sPtr  :QWORD
    LOCAL acnt  :QWORD

    sas sPtr,"  0ne ;  two ;   three  ; four ;  five;six  ; seven ; eight"
    mov acnt, rvcall(arg_cnt,sPtr)
    conout "Argument count = ",str$(acnt),lf

    waitkey
    .exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

arg_cnt proc
  ; -------------------------------
  ; counts ";" delimiter and adds 1
  ; rcx = string address
  ; -------------------------------
    cmp BYTE PTR [rcx], 0       ; exit with error on null string
    je error

    xor rax, rax                ; set rax to 0
    sub rcx, 1
  @@:
    add rcx, 1
    cmp BYTE PTR [rcx], 0       ; test for terminator
    je @F
    cmp BYTE PTR [rcx], 59      ; test if delimiter
    jne @B
    add rax, 1                  ; increment the arg count
    jmp @B

  @@:
    add rax, 1                  ; return delimiter count + 1
    ret

  error:
    xor rax, rax                ; exit on empty string with rax = 0
    ret

arg_cnt endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Title: Re: Command line parsing algorithms.
Post by: sinsi on March 10, 2019, 11:18:40 PM
>but its not like it matters much
Every little bit helps...
Title: Re: Command line parsing algorithms.
Post by: jj2007 on March 10, 2019, 11:37:18 PM
Yeah... but x64 instructions are notoriously longer. An inc rcx is just one byte shorter than add rcx, 1. So at max the routine could be only 5 bytes shorter, including a dirty hack like cmp BYTE PTR [rcx], ah  ; test for terminator  ;)

Here is the extra short version, 28 bytes:
arg_cnt:
  ; -------------------------------
  ; counts ";" delimiter and adds 1
  ; rcx = string address
  ; -------------------------------
    xor eax, eax            ; set rax to 0
    cmp BYTE PTR [rcx], al       ; exit with error on null string
    je error

    dec rcx
  @@:
    inc rcx
    cmp BYTE PTR [rcx], ah       ; test for terminator
    je @F
    cmp BYTE PTR [rcx], 59      ; test if delimiter
    jne @B
    inc eax                  ; increment the arg count
    jmp @B

  @@:
    inc eax                  ; return delimiter count + 1
  error:
    ret 0
Title: Re: Command line parsing algorithms.
Post by: hutch-- on March 10, 2019, 11:48:23 PM
I don't particularly lose any sleep over the instruction length because 64 bit processors run 64 bit instructions natively. Yes you can often use a 32 bit register but the processor still reads the 64 bit register so the only gain was trying to port 32 bit legacy code to 64 bit.
Title: Re: Command line parsing algorithms.
Post by: jj2007 on March 11, 2019, 12:03:56 AM
Well, it's 41->28 bytes. Speedwise it may become a problem if you frequently use the function in loops with a Million iterations. Btw with optimisations on, 64-bit compilers use these tricks with 32-bit registers.
Title: Re: Command line parsing algorithms.
Post by: sinsi on March 11, 2019, 12:15:16 AM
I was thinking more along the lines of extra bytes adding up over all procs, not calling one longer proc a million times.
If you are using RAX when a 32-bit number is all you ever use, it doubles the instruction length (I would imagine speed stays the same).
Over a few thousand lines of code it could add up to a few cache misses etc.

cmp BYTE PTR [rcx], ah
Too bad if there are more than 255 args (yes I have seen this once) :biggrin:
Title: Re: Command line parsing algorithms.
Post by: TimoVJL on March 11, 2019, 03:22:45 AM
Quote from: hutch-- on March 10, 2019, 10:38:25 PM
I have heard this but MASM is not a Microsoft C compiler, usually people set the NO option when they try to use a mnemonic that is not allowed in x64. The solution is to learn the correct codings that work in x64.
Not a cl issue, as the command line option /LARGEADDRESSAWARE alone is quite useless with x64, no effect at all, as it is a default.
Just use a PEView or similar to check IMAGE_FILE_LARGE_ADDRESS_AWARE bit after linking.
Title: Re: Command line parsing algorithms.
Post by: hutch-- on March 11, 2019, 04:52:26 AM
From long ago I have kept hearing the assumption that shorter code in terms of byte length is supposed to be faster but the clock does not agree with that assumption as it was based off the pre-i486 hardware in 16 bit real mode. It vaguely mattered in MS-DOS COM files but with the i486 came pipelines and the beginning of instruction scheduling and instructions are still read complete, not in part. An instruction muncher does not care about the instruction size as long as the processor and OS version are capable of reading that instruction. I am still fascinated why MS-DOS assumptions linger on in 32 and 64 bit coding practices.

Whatever the perceived advantages may happen to be with any given algo design, look at the alignment padding at the end of the procedure and you have thrown most of it away most of the time. If an algorithm is so long that it is effected by cache problems, redesign it and the ultimate test is as usual, the clock, the rest does not matter.

RE : C compiler optimisation.
It was not that long ago that optimising C compilers loaded an immediate into a register before performing an ADD, as before the clock is the one that matters.

Timo, this is what Microsoft have to say about /LARGEADDRESSAWARE
Quote
The /LARGEADDRESSAWARE option tells the linker that the application can handle addresses larger than 2 gigabytes. In the 64-bit compilers, this option is enabled by default. In the 32-bit compilers, /LARGEADDRESSAWARE:NO is enabled if /LARGEADDRESSAWARE is not otherwise specified on the linker line.

If the only gain is in typing a shorter link line, I don't have a problem with it as I auto generate the batch files I use to build 64 bit MASM binaries.
Title: Re: Command line parsing algorithms.
Post by: daydreamer on March 12, 2019, 04:23:21 AM
I also like that my cpu is a 3.5ghz+ number cruncher and my favourite kind of instruction sets are the biggest ones,but I believe the newest cpus are better designed to handle large opcodes+largedata size and faster calculate numbers

and /LARGEADRESSAWARE, I really like to see a masm64 or masm32 program to take advantage of that kind of memorysize allocation,not here but probably in game forum maybe reading in compressed data and store uncompressed

Title: Re: Command line parsing algorithms.
Post by: jj2007 on March 12, 2019, 06:44:10 AM
The issue is not the length of an instruction. If it's in the cache, it doesn't matter if it's a one-byter like inc eax or lodsb. But if a loop doesn't fit into the instruction cache, the cpu must reload the instructions, and that costs obviously cycles. Therefore shorter instructions allow more complex loops.

https://stackoverflow.com/questions/22921373/how-to-write-instruction-cache-friendly-program-in-c
Title: Re: Command line parsing algorithms.
Post by: hutch-- on March 12, 2019, 10:47:21 AM
I am much of the view that if an algorithm does not fit into the cache, it either needs to be re-written or it is that long by necessity then there is little you can do about it. When you use old instructions like LODSB without the REP prefix you get a serious performance penalty which byte level reduction will not compensate for. With loop code there is a simple technique to test cache effects, unroll the loop until the loop timing gets slower then back it off until its near its fastest, slightly less is better as different processors respond differently to how much loop unrolling is used.

ALA Intel manual

In 64-bit mode, INC r16 and INC r32 are not encodable (because opcodes 40H through 47H are REX prefixes).
Otherwise, the instruction's 64-bit mode default operation size is 32 bits. Use of the REX.R prefix permits access to
additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits.


INC DEC versus ADD SUB. See this link for a discussion on it.
https://stackoverflow.com/questions/36510095/inc-instruction-vs-add-1-does-it-matter

I read the article you posted but its old stuff aimed at people writing C/C++ code, applying this C/C++ theory to direct assembler is like the cart pushing the horse. Branch reduction has been with us for a long time, so has instruction count reduction and the one that really matters is memory operand reduction as you will see this when timing an algorithm.
Title: Re: Command line parsing algorithms.
Post by: hutch-- on March 13, 2019, 01:59:52 PM
This is probably the version I will add to the library for counting arguments separated by a delimiter. An extra instruction in the loop but only 1 memory read per iteration and full 64 bit registers with no partial register reads or writes.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

count_args proc
  ; -------------------------------
  ; counts ";" delimiter and adds 1
  ; rcx = string address
  ; -------------------------------
    cmp BYTE PTR [rcx], 0       ; exit with error on null string
    je error

    xor rax, rax                ; set rax to 0
    sub rcx, 1
  @@:
    add rcx, 1
    movzx rdx, BYTE PTR [rcx]   ; zero extend byte to rdx
    test rdx, rdx               ; test for terminator
    jz @F                       ; exit loop on 0
    cmp rdx, 59                 ; test if delimiter
    jnz @B                      ; loop back if not delimiter
    add rax, 1                  ; increment the arg count
    jmp @B                      ; loop back

  @@:
    add rax, 1                  ; return delimiter count + 1
    ret

  error:
    xor rax, rax                ; exit on empty string with rax = 0
    ret

count_args endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Title: Re: Command line parsing algorithms.
Post by: sinsi on March 13, 2019, 02:24:38 PM
    xor eax,eax
    cmp [rcx],al
    jz done
  @@:
    mov dl,[rcx]
    test dl,dl
    jz @F
    inc rcx
    cmp dl,";"
    jnz @B
    inc eax
    jmp @B
  @@:
    inc eax
  done:
    ret

edited: to return 0 on empty string
Title: Re: Command line parsing algorithms.
Post by: hutch-- on March 13, 2019, 08:12:53 PM
There is only one integer register size on a 64 bit processor, its 64 bit which with the accumulator is rax. You get eax, ax and al through masking of rax. I avoid this by sticking with the native register size which in this context is 64 bit. Write an algo of either type and put them into an app then look at the padding after either proc and your gain in size reduction is wasted.
Title: Re: Command line parsing algorithms.
Post by: sinsi on March 14, 2019, 07:10:45 PM
The default operand size is still 32 bit, so using 64 bits unnecessarily imposes a penalty - not speed but size.
They even made it easy for us, "xor rax,rax" is exactly the same as "xor eax,eax" but needs the REX prefix byte.
In the context of a 10,000 line program it can make a big difference.

I just think it's a bad habit to get into. Don't get blinded by "64 bits".
Title: Re: Command line parsing algorithms.
Post by: hutch-- on March 14, 2019, 11:01:23 PM
 :biggrin:

Relying on a one horse trick to port 32 bit to 64 bit is itself a risky path. XOR does work the same with EAX but not with AX or AL. The only gain you can get is a reduction in the size of some instructions but as I have commented before, you lose that in most instances with the alignment padding after each procedure. With normal 16 byte alignment you have to try and save enough bytes at whatever cost to avoid the next 16 byte boundary and unless the procedure is long enough to do that you gain nothing.

64 bit code does end up a bit larger but nothing like twice the size of comparable 32 bit code and it has the great advantages of many more integer registers and far more memory that the effective 2 gig limit of single allocation of 32 bit. From memory a Win10 64 box with enough memory can address up to 128 gig. I can routinely test on 32 gig and you would be surprised just how slow traditional 32 bit design is on memory that big.

I am still fascinated that the pre i486 style of code lingers on, the days of real mode pre-286 are long over and while it did make some difference with DOS COM files if you were up near the 64k limit, its a waste of time on anything from the i486 upwards. I remember some folks waxing lyrical about using short jumps to save space but it never went any faster.
Title: Re: Command line parsing algorithms.
Post by: Alex81524 on March 26, 2019, 08:31:46 PM
hutch—
When do you decide to rename your environment to masm64? Let it be version 1. Everybody really needs it. Best regards, Alex
Title: Re: Command line parsing algorithms.
Post by: hutch-- on March 26, 2019, 11:01:17 PM
I got your last "reminder" but its the same problem, not only do I NOT own the domain name but it would involve rewriting all of the source code as it intentionally uses hard coded paths to avoid picking up the incorrect binaries, libraries and the like.
Title: Re: Command line parsing algorithms.
Post by: aw27 on March 26, 2019, 11:21:28 PM
Quote from: Alex81524 on March 26, 2019, 08:31:46 PM
hutch—
When do you decide to rename your environment to masm64? Let it be version 1. Everybody really needs it. Best regards, Alex
I am waiting as well for Microsoft to rename the System32 folder to System64 and update kernel32.dll to kernel64.dll. Until then we will never have a real 64-bit operating system.  :lol:
Title: Re: Command line parsing algorithms.
Post by: jj2007 on March 27, 2019, 01:33:02 AM
Quote from: hutch-- on March 26, 2019, 11:01:17 PMas it intentionally uses hard coded paths to avoid picking up the incorrect binaries, libraries and the like.

Given the incredible mess with "flexible" paths in other languages, I herewith suggest Hutch for the Nobel Price for Relaxed Programming 8)

Google "C++" "path" "not found": About 1,600,000 results :bgrin: