News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Speed testing algos

Started by LordAdef, March 25, 2017, 04:24:49 PM

Previous topic - Next topic

LordAdef

Thanks to Mineiro´s tips, I managed to squeeze some good ms from the algo.

I took his dword example above but changed with a little trick I think saved me some time. Instead of doing the dword copy loop and then the byte to byte for the remainders, I rolled the dword loop 1 step beyond my target. Afterwards, I set my ptr index back. Although I will be rewriting 1 or 2 bytes, I got rid of a lot of code:

Quote
      mUNROLL3 MACRO   ;NEW
         ; copy dword chunks + 1 dword
         ; edi is brought back overwritting the extra bytes
      xor eax, eax
      .IF ecx == 64
         @@:
         cmp eax, ebx
         ja @F                           ; it goes beyond ebx
         mov dword ptr [edi+eax],"@@@@"
         add eax, 4
         jmp @B
        @@:
         add edi, ebx   

The history of my timings step by step (resumed, full list in the zip):
Quote
New Algo: 103 ms       Old Algo: 97 ms
New Algo: 102 ms       Old Algo: 96 ms
New Algo: 102 ms       Old Algo: 97 ms

after using "NEG":
   .mUNROLLdword was slower, using mUNROLL
   .SHR/SHL was slower

New Algo: 98 ms        Old Algo: 103 ms
New Algo: 98 ms        Old Algo: 102 ms
New Algo: 98 ms        Old Algo: 102 ms

after optm tChar for ecx mUNROLL2:

New Algo: 96 ms        Old Algo: 99 ms
New Algo: 94 ms        Old Algo: 104 ms
New Algo: 92 ms        Old Algo: 98 ms

after exchanging tChar for ecx:
New Algo: 91 ms        Old Algo: 97 ms
New Algo: 93 ms        Old Algo:134 ms
New Algo: 93 ms        Old Algo: 97 ms

after xor tChar, 01100000b
New Algo: 91 ms        Old Algo: 98 ms
New Algo: 90 ms        Old Algo: 97 ms
New Algo: 91 ms        Old Algo: 98 ms

after using the adapted Mineiro´s dword copy code
   .got rid of tChar local. Using ECX
New Algo: 65 ms        Old Algo: 102 ms
New Algo: 65 ms        Old Algo: 98 ms
New Algo: 66 ms        Old Algo: 102 ms

after making LineLenght an EQU
New Algo: 65 ms        Old Algo: 96 ms
New Algo: 64 ms        Old Algo: 96 ms
New Algo: 64 ms        Old Algo: 97 ms

I left old code commented for comparison reasons.

Do you guys see any place I can still improve within this algo design?

jj2007

What about this: mov edx, 12-1 ;counter
xor ebx, ebx
bak:
inc edx
@Main:
cmp edx, tLen
jz DoneDC ; DONE?
mov bl, SBYTE Ptr [esi+edx]
test bl, bl
js Negative
jz Zero
; ------------------------- It´s Positive:
add tLineLen, 1 ; inc line num
add tSum, ebx ; Sums line (to recouver missing chunk)
mUNROLL3 ;x ; MUST HAVE ecx == 64 or 32
xor ecx, 01100000b ;x
jmp bak
;       bak:
; add edx, 1
; jmp @Main

mineiro

hello sir LordAdef;

Ecx holds value to be stored
mUNROLL3 MACRO ;NEW
; copy dword chunks + 1 dword
; edi is brought back overwritting the extra bytes

xor eax, eax
; .IF ecx == "@@@@" ;64
   @@:
cmp eax, ebx
ja @F ; it goes beyond ebx
mov dword ptr [edi+eax],ecx ;"@@@@"
add eax, 4
jmp @B
  @@:
add edi, ebx
; .ELSE
;    @@:
; cmp eax, ebx ; lenght of line
; ja @F
; mov dword ptr [edi+eax],"    "
; add eax, 4
; jmp @B
;   @@:
; add edi, ebx
; .ENDIF
endm


You now put "@" or space into a register, so why not expand that to "@@@@" or "    "?
pDecompressNEW proc, string:DWORD

      LOCAL  tLine:DWORD
      LOCAL  tLineLen :DWORD
      LOCAL  tSum:DWORD

      LOCAL  tLen :DWORD
      LOCAL  tChar:DWORD ; 64@ or 32" " ; NOT USING TCHAR -> ECX

mov tLine, 0
mov tLineLen, 0
mov tSum, 0
   
    LineLenght = 160 ; Lenght of line is fixed to 160
   
    mov   edi, offset Decomp
mov   esi, InputFile(string)
    mov   tLen, ecx
mov ecx,"@@@@" ;<-------------
mov edx, 12
xor ebx, ebx
@Main:
cmp edx, tLen
jz DoneDC

mov bl, SBYTE Ptr [esi+edx] ;carrego um byte que contém um contador
test bl, bl ;se for zero ou negativo
js Negative
jz Zero
; ------------------------- ItŽs Positive:
add tLineLen, 1
add tSum, ebx
mUNROLL3
xor ecx, 01100000011000000110000001100000b ;<------------ permute 4 @@@@ to 4 spaces or vice-versa
bak:
add edx, 1
jmp @Main

Negative:
  add tLineLen, 1
neg bl
mov ecx, "    "  ;<-----------------
add tSum, ebx
mUNROLL3
; ----------- Missing chunk -----------
mov ebx, LineLenght ;x
xor ecx, 01100000011000000110000001100000b ;<------------ permute 4 @@@@ to 4 spaces or vice-versa
sub ebx, tSum ; Missing chunk value

mUNROLL3 ;x ; copy macro, byte per byte
mCHECKnext ; ecx is still the val of tChar. "mov tChar, ecx" here at this macro
  add tLine, 1
jmp bak

  Zero: ; Copy last line from edi itself. Uses MemCopy, dword.
mov eax, edi
sub eax, LineLenght

invoke MemCopy, eax, edi, LineLenght
add edi, LineLenght

mCHECKnext ;check if next val is zero -> makes LineLen & Sum = 0
add tLine, 1
mov ecx, "@@@@" ;<----------
jmp bak

DoneDC:
free esi
xor edi, edi ;edi is index 
ret
pDecompressNEW endp
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

LordAdef

hey Mineiro,

you are right, it's so obvious and still it didn't click me. in fact, with that approach, I can even loose that conditional .IF ecx==   since ecx already holds the value we need. We just need to copy ecx straight away.
I can't wait to time this and see if that saves 1 or 2 extra ms.

JJ idea above (jmp bak) is also very good! it saves 1 jump for every value. Clean and sutil.

LordAdef

Quoteafter making LineLenght an EQU
New Algo: 65 ms        Old Algo: 96 ms
New Algo: 65 ms        Old Algo: 98 ms
New Algo: 65 ms        Old Algo: 96 ms
New Algo: 64 ms        Old Algo: 96 ms
New Algo: 64 ms        Old Algo: 97 ms

after ecx = "@@@@", and bak loop adjusted
New Algo: 63 ms        Old Algo: 99 ms
New Algo: 62 ms        Old Algo: 100 ms
New Algo: 61 ms        Old Algo: 99 ms
New Algo: 62 ms        Old Algo: 98 ms
New Algo: 62 ms        Old Algo: 99 ms
New Algo: 61 ms        Old Algo: 99 ms
New Algo: 61 ms        Old Algo: 98 ms
New Algo: 61 ms        Old Algo: 99 ms

The latest moves gave it a boost of about 2-3 ms. Nice!

mineiro

I think that if you describe what your code should do we can help more.
I don't get a clear idea about what your code is doing, well, what negative,positive or zero piece codes should do.
If you describe so I can create a code and we can measure this function.

---edited----
something like, you're dealing with a map that have 2 chars (@ or space), so, you can fullfill a map with most used char, I'm supposing "@",so, the code only need draw(overwrite) river (space).
Have a lot of approaches to be tried.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

LordAdef

Sorry Mineiro,

This is related to the game I am developing in the other thread. this algo reads back my RLE map backgrounf file. Since you posted there I thought you were in the loop:

http://masm32.com/board/index.php?topic=5962.msg64373#msg64373

This is what it is, with an example. the game is a River Raid recreation.

mineiro

oh yes, my fault, I'm reading that code now.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

mineiro

Hello sir LordAdef;
Is this description right or wrong?

compressed file.dat holds a structure with 3 dword members (original size, line count and line lenght) followed by byte counters

char = @ || space
while byte counters != end of file do
if counter < 0, neg counter,count sum, store N chars, permute char, store char (160-count sum) times
if counter > 0, store N chars, permute char, count sum
if counter = 0, copy last 160 uncompressed chars (repeat last line)

I'd rather be this ambulant metamorphosis than to have that old opinion about everything

LordAdef

Hi Mineiro,

That´s the idea.
Visually, it goes like this:

@@@@@@@@@@     @@@     @@@@@@@@@@ -> this is a line. Now, my map is set to 160 chars
            10               5    3      5               x             

when compressing, I loose the last chunk (mark as x) since I can put it back on decomp. I neg the chunk before the last, to mark it as my "end of line".

So, This line will be compressed like this:   "10, 5, 3, -5"
Similar lines are mark as "0"
Like:
@@@@@@@@@@     @@@     @@@@@@@@@@
@@@@@@@@@@     @@@     @@@@@@@@@@
@@@@@@@@@@     @@@     @@@@@@@@@@

= 10, 5, 3, -5, 0, 0

you got it!
Abraços!
Alex