Author Topic: Speed testing algos  (Read 661 times)

LordAdef

  • Member
  • ***
  • Posts: 255
Re: Speed testing algos
« Reply #15 on: March 27, 2017, 04:08:54 PM »
Thanks to Mineiro´s tips, I managed to squeeze some good ms from the algo.

I took his dword example above but changed with a little trick I think saved me some time. Instead of doing the dword copy loop and then the byte to byte for the remainders, I rolled the dword loop 1 step beyond my target. Afterwards, I set my ptr index back. Although I will be rewriting 1 or 2 bytes, I got rid of a lot of code:

Quote
      mUNROLL3 MACRO   ;NEW
         ; copy dword chunks + 1 dword
         ; edi is brought back overwritting the extra bytes
      xor eax, eax
      .IF ecx == 64
         @@:
         cmp eax, ebx
         ja @F                           ; it goes beyond ebx
         mov dword ptr [edi+eax],"@@@@"
         add eax, 4
         jmp @B
        @@:
         add edi, ebx   

The history of my timings step by step (resumed, full list in the zip):
Quote
New Algo: 103 ms       Old Algo: 97 ms
 New Algo: 102 ms       Old Algo: 96 ms
 New Algo: 102 ms       Old Algo: 97 ms

after using "NEG":
   .mUNROLLdword was slower, using mUNROLL
   .SHR/SHL was slower

 New Algo: 98 ms        Old Algo: 103 ms
 New Algo: 98 ms        Old Algo: 102 ms
 New Algo: 98 ms        Old Algo: 102 ms

after optm tChar for ecx mUNROLL2:

New Algo: 96 ms        Old Algo: 99 ms
New Algo: 94 ms        Old Algo: 104 ms
New Algo: 92 ms        Old Algo: 98 ms

after exchanging tChar for ecx:
New Algo: 91 ms        Old Algo: 97 ms
New Algo: 93 ms        Old Algo:134 ms
New Algo: 93 ms        Old Algo: 97 ms

after xor tChar, 01100000b
New Algo: 91 ms        Old Algo: 98 ms
New Algo: 90 ms        Old Algo: 97 ms
New Algo: 91 ms        Old Algo: 98 ms

after using the adapted Mineiro´s dword copy code
   .got rid of tChar local. Using ECX
New Algo: 65 ms        Old Algo: 102 ms
New Algo: 65 ms        Old Algo: 98 ms
New Algo: 66 ms        Old Algo: 102 ms

after making LineLenght an EQU
New Algo: 65 ms        Old Algo: 96 ms
New Algo: 64 ms        Old Algo: 96 ms
New Algo: 64 ms        Old Algo: 97 ms

I left old code commented for comparison reasons.

Do you guys see any place I can still improve within this algo design?

jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: Speed testing algos
« Reply #16 on: March 27, 2017, 06:41:38 PM »
What about this:
Code: [Select]
mov edx, 12-1 ;counter
xor ebx, ebx
bak:
inc edx
 @Main:
cmp edx, tLen
jz DoneDC ; DONE?
mov bl, SBYTE Ptr [esi+edx]
test bl, bl
js Negative
jz Zero
; ------------------------- It´s Positive:
  add tLineLen, 1 ; inc line num
  add tSum, ebx ; Sums line (to recouver missing chunk)
  mUNROLL3 ;x ; MUST HAVE ecx == 64 or 32
  xor ecx, 01100000b ;x
  jmp bak
;       bak:
; add edx, 1
; jmp @Main

mineiro

  • Member
  • ***
  • Posts: 365
Re: Speed testing algos
« Reply #17 on: March 28, 2017, 02:32:03 AM »
hello sir LordAdef;

Ecx holds value to be stored
Code: [Select]
mUNROLL3 MACRO ;NEW
; copy dword chunks + 1 dword
; edi is brought back overwritting the extra bytes

xor eax, eax
; .IF ecx == "@@@@" ;64
   @@:
cmp eax, ebx
ja @F ; it goes beyond ebx
mov dword ptr [edi+eax],ecx ;"@@@@"
add eax, 4
jmp @B
  @@:
add edi, ebx
; .ELSE
;    @@:
; cmp eax, ebx ; lenght of line
; ja @F
; mov dword ptr [edi+eax],"    "
; add eax, 4
; jmp @B
;   @@:
; add edi, ebx
; .ENDIF
endm

You now put "@" or space into a register, so why not expand that to "@@@@" or "    "?
Code: [Select]
pDecompressNEW proc, string:DWORD

      LOCAL  tLine:DWORD
      LOCAL  tLineLen :DWORD
      LOCAL  tSum:DWORD

      LOCAL  tLen :DWORD
      LOCAL  tChar:DWORD ; 64@ or 32" " ; NOT USING TCHAR -> ECX

mov tLine, 0
mov tLineLen, 0
mov tSum, 0
   
    LineLenght = 160 ; Lenght of line is fixed to 160
   
    mov   edi, offset Decomp
mov   esi, InputFile(string)
    mov   tLen, ecx
mov ecx,"@@@@" ;<-------------
mov edx, 12
xor ebx, ebx
 @Main:
cmp edx, tLen
jz DoneDC

mov bl, SBYTE Ptr [esi+edx] ;carrego um byte que contém um contador
test bl, bl ;se for zero ou negativo
js Negative
jz Zero
; ------------------------- ItŽs Positive:
  add tLineLen, 1
  add tSum, ebx
  mUNROLL3
  xor ecx, 01100000011000000110000001100000b ;<------------ permute 4 @@@@ to 4 spaces or vice-versa
bak:
add edx, 1
jmp @Main

Negative:
  add tLineLen, 1
neg bl
mov ecx, "    "  ;<-----------------
add tSum, ebx
mUNROLL3
; ----------- Missing chunk -----------
mov ebx, LineLenght ;x
xor ecx, 01100000011000000110000001100000b ;<------------ permute 4 @@@@ to 4 spaces or vice-versa
sub ebx, tSum ; Missing chunk value

mUNROLL3 ;x ; copy macro, byte per byte
mCHECKnext ; ecx is still the val of tChar. "mov tChar, ecx" here at this macro
  add tLine, 1
jmp bak

  Zero: ; Copy last line from edi itself. Uses MemCopy, dword.
mov eax, edi
sub eax, LineLenght

invoke MemCopy, eax, edi, LineLenght
add edi, LineLenght

mCHECKnext ;check if next val is zero -> makes LineLen & Sum = 0
add tLine, 1
mov ecx, "@@@@" ;<----------
jmp bak

DoneDC:
free esi
xor edi, edi ;edi is index 
ret
pDecompressNEW endp
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

LordAdef

  • Member
  • ***
  • Posts: 255
Re: Speed testing algos
« Reply #18 on: March 28, 2017, 03:11:26 AM »
hey Mineiro,

you are right, it's so obvious and still it didn't click me. in fact, with that approach, I can even loose that conditional .IF ecx==   since ecx already holds the value we need. We just need to copy ecx straight away.
I can't wait to time this and see if that saves 1 or 2 extra ms.

JJ idea above (jmp bak) is also very good! it saves 1 jump for every value. Clean and sutil.

LordAdef

  • Member
  • ***
  • Posts: 255
Re: Speed testing algos
« Reply #19 on: March 28, 2017, 03:55:04 AM »
Quote
after making LineLenght an EQU
New Algo: 65 ms        Old Algo: 96 ms
New Algo: 65 ms        Old Algo: 98 ms
New Algo: 65 ms        Old Algo: 96 ms
New Algo: 64 ms        Old Algo: 96 ms
New Algo: 64 ms        Old Algo: 97 ms

after ecx = "@@@@", and bak loop adjusted
New Algo: 63 ms        Old Algo: 99 ms
New Algo: 62 ms        Old Algo: 100 ms
New Algo: 61 ms        Old Algo: 99 ms
New Algo: 62 ms        Old Algo: 98 ms
New Algo: 62 ms        Old Algo: 99 ms
New Algo: 61 ms        Old Algo: 99 ms
New Algo: 61 ms        Old Algo: 98 ms
New Algo: 61 ms        Old Algo: 99 ms

The latest moves gave it a boost of about 2-3 ms. Nice!

mineiro

  • Member
  • ***
  • Posts: 365
Re: Speed testing algos
« Reply #20 on: March 28, 2017, 04:09:12 AM »
I think that if you describe what your code should do we can help more.
I don't get a clear idea about what your code is doing, well, what negative,positive or zero piece codes should do.
If you describe so I can create a code and we can measure this function.

---edited----
something like, you're dealing with a map that have 2 chars (@ or space), so, you can fullfill a map with most used char, I'm supposing "@",so, the code only need draw(overwrite) river (space).
Have a lot of approaches to be tried.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

LordAdef

  • Member
  • ***
  • Posts: 255
Re: Speed testing algos
« Reply #21 on: March 28, 2017, 05:10:56 AM »
Sorry Mineiro,

This is related to the game I am developing in the other thread. this algo reads back my RLE map backgrounf file. Since you posted there I thought you were in the loop:

http://masm32.com/board/index.php?topic=5962.msg64373#msg64373

This is what it is, with an example. the game is a River Raid recreation.

mineiro

  • Member
  • ***
  • Posts: 365
Re: Speed testing algos
« Reply #22 on: March 28, 2017, 08:37:22 AM »
oh yes, my fault, I'm reading that code now.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

mineiro

  • Member
  • ***
  • Posts: 365
Re: Speed testing algos
« Reply #23 on: March 29, 2017, 01:34:24 AM »
Hello sir LordAdef;
Is this description right or wrong?

compressed file.dat holds a structure with 3 dword members (original size, line count and line lenght) followed by byte counters

char = @ || space
while byte counters != end of file do
if counter < 0, neg counter,count sum, store N chars, permute char, store char (160-count sum) times
if counter > 0, store N chars, permute char, count sum
if counter = 0, copy last 160 uncompressed chars (repeat last line)

I'd rather be this ambulant metamorphosis than to have that old opinion about everything

LordAdef

  • Member
  • ***
  • Posts: 255
Re: Speed testing algos
« Reply #24 on: March 29, 2017, 11:15:09 AM »
Hi Mineiro,

That´s the idea.
Visually, it goes like this:

@@@@@@@@@@     @@@     @@@@@@@@@@ -> this is a line. Now, my map is set to 160 chars
            10               5    3      5               x             

when compressing, I loose the last chunk (mark as x) since I can put it back on decomp. I neg the chunk before the last, to mark it as my "end of line".

So, This line will be compressed like this:   "10, 5, 3, -5"
Similar lines are mark as "0"
Like:
@@@@@@@@@@     @@@     @@@@@@@@@@
@@@@@@@@@@     @@@     @@@@@@@@@@
@@@@@@@@@@     @@@     @@@@@@@@@@

= 10, 5, 3, -5, 0, 0

you got it!
Abraços!
Alex