News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Command line parsing algorithms.

Started by hutch--, March 10, 2019, 11:07:21 AM

Previous topic - Next topic

hutch--

#15
I am much of the view that if an algorithm does not fit into the cache, it either needs to be re-written or it is that long by necessity then there is little you can do about it. When you use old instructions like LODSB without the REP prefix you get a serious performance penalty which byte level reduction will not compensate for. With loop code there is a simple technique to test cache effects, unroll the loop until the loop timing gets slower then back it off until its near its fastest, slightly less is better as different processors respond differently to how much loop unrolling is used.

ALA Intel manual

In 64-bit mode, INC r16 and INC r32 are not encodable (because opcodes 40H through 47H are REX prefixes).
Otherwise, the instruction's 64-bit mode default operation size is 32 bits. Use of the REX.R prefix permits access to
additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits.


INC DEC versus ADD SUB. See this link for a discussion on it.
https://stackoverflow.com/questions/36510095/inc-instruction-vs-add-1-does-it-matter

I read the article you posted but its old stuff aimed at people writing C/C++ code, applying this C/C++ theory to direct assembler is like the cart pushing the horse. Branch reduction has been with us for a long time, so has instruction count reduction and the one that really matters is memory operand reduction as you will see this when timing an algorithm.

hutch--

This is probably the version I will add to the library for counting arguments separated by a delimiter. An extra instruction in the loop but only 1 memory read per iteration and full 64 bit registers with no partial register reads or writes.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

count_args proc
  ; -------------------------------
  ; counts ";" delimiter and adds 1
  ; rcx = string address
  ; -------------------------------
    cmp BYTE PTR [rcx], 0       ; exit with error on null string
    je error

    xor rax, rax                ; set rax to 0
    sub rcx, 1
  @@:
    add rcx, 1
    movzx rdx, BYTE PTR [rcx]   ; zero extend byte to rdx
    test rdx, rdx               ; test for terminator
    jz @F                       ; exit loop on 0
    cmp rdx, 59                 ; test if delimiter
    jnz @B                      ; loop back if not delimiter
    add rax, 1                  ; increment the arg count
    jmp @B                      ; loop back

  @@:
    add rax, 1                  ; return delimiter count + 1
    ret

  error:
    xor rax, rax                ; exit on empty string with rax = 0
    ret

count_args endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

sinsi

#17
    xor eax,eax
    cmp [rcx],al
    jz done
  @@:
    mov dl,[rcx]
    test dl,dl
    jz @F
    inc rcx
    cmp dl,";"
    jnz @B
    inc eax
    jmp @B
  @@:
    inc eax
  done:
    ret

edited: to return 0 on empty string

hutch--

There is only one integer register size on a 64 bit processor, its 64 bit which with the accumulator is rax. You get eax, ax and al through masking of rax. I avoid this by sticking with the native register size which in this context is 64 bit. Write an algo of either type and put them into an app then look at the padding after either proc and your gain in size reduction is wasted.

sinsi

The default operand size is still 32 bit, so using 64 bits unnecessarily imposes a penalty - not speed but size.
They even made it easy for us, "xor rax,rax" is exactly the same as "xor eax,eax" but needs the REX prefix byte.
In the context of a 10,000 line program it can make a big difference.

I just think it's a bad habit to get into. Don't get blinded by "64 bits".

hutch--

 :biggrin:

Relying on a one horse trick to port 32 bit to 64 bit is itself a risky path. XOR does work the same with EAX but not with AX or AL. The only gain you can get is a reduction in the size of some instructions but as I have commented before, you lose that in most instances with the alignment padding after each procedure. With normal 16 byte alignment you have to try and save enough bytes at whatever cost to avoid the next 16 byte boundary and unless the procedure is long enough to do that you gain nothing.

64 bit code does end up a bit larger but nothing like twice the size of comparable 32 bit code and it has the great advantages of many more integer registers and far more memory that the effective 2 gig limit of single allocation of 32 bit. From memory a Win10 64 box with enough memory can address up to 128 gig. I can routinely test on 32 gig and you would be surprised just how slow traditional 32 bit design is on memory that big.

I am still fascinated that the pre i486 style of code lingers on, the days of real mode pre-286 are long over and while it did make some difference with DOS COM files if you were up near the 64k limit, its a waste of time on anything from the i486 upwards. I remember some folks waxing lyrical about using short jumps to save space but it never went any faster.

Alex81524

hutch—
When do you decide to rename your environment to masm64? Let it be version 1. Everybody really needs it. Best regards, Alex

hutch--

I got your last "reminder" but its the same problem, not only do I NOT own the domain name but it would involve rewriting all of the source code as it intentionally uses hard coded paths to avoid picking up the incorrect binaries, libraries and the like.

aw27

Quote from: Alex81524 on March 26, 2019, 08:31:46 PM
hutch—
When do you decide to rename your environment to masm64? Let it be version 1. Everybody really needs it. Best regards, Alex
I am waiting as well for Microsoft to rename the System32 folder to System64 and update kernel32.dll to kernel64.dll. Until then we will never have a real 64-bit operating system.  :lol:

jj2007

Quote from: hutch-- on March 26, 2019, 11:01:17 PMas it intentionally uses hard coded paths to avoid picking up the incorrect binaries, libraries and the like.

Given the incredible mess with "flexible" paths in other languages, I herewith suggest Hutch for the Nobel Price for Relaxed Programming 8)

Google "C++" "path" "not found": About 1,600,000 results :bgrin: