News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

"Hello masm32", not a BOT, new member

Started by LordAdef, January 22, 2017, 09:42:24 AM

Previous topic - Next topic

LordAdef

Quote from: hutch-- on March 02, 2017, 07:22:12 PM
:biggrin:

If in doubt, choose speed over size.
If not in doubt, choose speed over size.
If all else fails, choose speed over size.
Don't care what others do, choose speed over size.

And if none of that works, make it go fast !!!!

noted Hutch!
That's why I'm still trying to figure out a way to optimize this bit, but can't find a solution. The small little losses will be summing into a performance drawback. That's why I'm going slowly, and not pilling up code.

jj2007

Quote from: LordAdef on March 02, 2017, 07:15:25 PM
Average: 28
Interesting test.

Indeed: 28 microseconds. If you consider that a smooth video display requires around 20 frames per second, then a single frame needs 50 milliseconds. So you are a factor 2000 faster than required. There is no performance problem, your solution with the two bitmaps is the right one :t

LordAdef

Hello my friends!

I was working on the map´s RLE. I managed to compress it from 90kb to 2.485 bytes.
The code is not optimized and I was really lazy in many places, but I really wanted it to do its job asap.

I know I could´ve used some macros (like "cmpmem" and many others) that would´ve made my life easier. But I chose to do it all by myself and learn from it. 

The byte data looks like this:
Quote67      -15     0       0       0       0       67      -16     0    66      -17    65      -17     64      -18     63      -19     61      -21 .....

That´s the output:
Quote== COMPRESSING FILE ==
1. Adding dwords from orig. file
2. filtering chars
3. parsing
4. checking for equal lines and building Flag array


Flags array:


01111010000000000010000000000000000000000000000001001000001111100011111111110010
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000101110000000000000000000000000000000000000000000000000000000000000000000
00001000000000000000000000000000000000000000000010000000000110000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000001001
00100000100000001001100010011000110011111111111001000111111000010111100010110010


5. Building final array
6. done.


=============== REPORT ================

file format:[dd Orig.Size][dd lineCount][dd LineLenght][data:BYTE @sp@-sp ?]



Original size= 89997
New size=  2485
Number of lines= 558
Lengh of line= 159
Equal Lines found= 71

Done...

that list of 0/1s is my lazy way to optimize repeated lines.

Again, it needs a lot of cleaning up, but it works.
RichMasm and Asm files attached.

A curiosity: qEditor complained about "print str$(al)". I used print str$(eax) instead.

Now I have to make it read it back.

Last but not least: I ran some speed tests and what I got suggests .If macro is slower than the Switch/case macro. Is there a common sense on this?

.IF vs Case vs Case_:


Quote.IF    564   241   237   349   273   277   410   193   277   375   
Case 188   245   183   205   162   163   183   161   155   165

jj2007

Quote from: LordAdef on March 19, 2017, 07:05:02 PMA curiosity: qEditor complained about "print str$(al)". I used print str$(eax) instead.

That is not qEditor's fault. The Masm32 str$() is just less flexible than Str$().

Quote.If macro is slower than the Switch/case macro. Is there a common sense on this?

There is no consensus but instead, with forum search for switch timings you can find long threads like this one :P

Nice work btw :t

If you have many identical lines, sorting them before encoding might be interesting.

LordAdef

Thanks JJ,

QuoteThat is not qEditor's fault. The Masm32 str$() is just less flexible than Str$().
I´m sure it isn´t. But it must be something between ML.exe and JWasm. I used the masm32rt to built it.

QuoteIf you have many identical lines, sorting them before encoding might be interesting.

I thought of that, and it would be a lot easier. This map has 71 repeated lines, within 558 lines with 159 chars/line. My reasoning was that doing it afterwards (although more complicated in the logic) would be shorter faster.

hutch--

Alex,

The "str$()" macro in MASM32 is specifically for DWORD values, basic emulation and assembler are not the same thing. To use the str$() macro on smaller values you use MOVZX to make it the right size.

mineiro

#126
Hello sir LordAdef;
we have different instructions to compare data, default is 'cmp', but if you are comparing if something is zero, you can use logical 'or', or 'test', or by some assumptions use dec or sub, ... . These instructions can do same thing but how much efficient is to be measure.

I consider code below an ugly code because use a lot of data and little code, one to one byte relation.
;========== FILTER UNWANTED CHARS ====================
;I don't have assembled, can appear sintax errors
printf (" 2. filtering chars\n")
mov ecx, tLen ;.if tLen == 0, abort
or ecx,ecx ;test ecx,ecx?
jz done
push esi
filter:
movzx eax,[esi] ;load a byte data from source
movzx eax,[eax + mytable] ;compute data
or eax,eax ;.if data == 0 ignore
jz ignore_this_data
mov [esi],al ;insert transformed data
ignore_this_data:
inc esi ;next data
dec ecx ;tLen-1
jnz filter
pop esi
done:


.data
align 16
mytable label dword ;byte to byte relation
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h," ",00h,00h

db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db "@",00h,"@",00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h

db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h

db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h
.code

I don't have tried to optimize this code, maybe can use cmovxx instructions, ... , I only posted this code to show a way to try to avoid many comparisions.
If you include others signs like "{","}","(",")", so you only need change the byte relation on data table instead of create more code with comparisions.

This tabled code supposed reach a constant time execution, while when inserting compares the execution time changes at each insertion of new signs. The tabled way is usefull too if you like to do your program portable to other chips like microcontrollers. Imaginate a table like hexstring_to_hex, on pc we can just 'mov' to convert and on others chips too, but if there so much code done to PC, will get harder to port that code to others processors/microcontrolers. Well, I'm getting offtopic, "the good, the bad and the ugly" theme song.

edited after: removed an  unecessary push instruction.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

LordAdef

Hi Mineiro, my dear fellow country man,

Nice to see you here!

Quoteprintf (" 2. filtering chars\n")
          xor edi, edi      
     filter:
           movzx   ebx, BYTE Ptr  [esi + edi]
      Switch ebx
      Case '='                           ; = is nothing, for tests only         
         mov BYTE Ptr  [esi + edi], " "         
      Case 60, 62                        ; <  > are bombers signs
         mov BYTE Ptr  [esi + edi], '@'   
      Endsw

   add    edi, 1   
          cmp   edi, tLen
          jnz     filter

I see you are referring to this session of code. I´ll study your example more carefully and learn from it. But in my case there is no need for a Look up table. The chars to filter are minimum in quantity.

Basically, I´m using the base map to set my game level design and so forth. these marks need to be removed. As an example, I´m marking lines with ">" where my bombers should be set (check out my last exe for the game for the bombers).

">" as a bomber:
Quote@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                   @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

All that filtering session is doing is checking for these isolated symbols and change them back to the bk @ one.

I might have not understood you well, please forgive me if I did. I thought a test case should be quite enough for the task

LordAdef

#128
Forgot to say, thanks for the "or"/"test" tip when comparing to 0.

And by the way, in my proggy I´m checking the zero flag very often:

Quotecmp bl, 0
   js Negative
   jz Zero
   jns Positive

I got away with "cmp bl, 0". Should "test" be a better one then? I guess the answer is ...yes..?


edit: Self correction: AND does the job nicely

jj2007

Quote from: LordAdef on March 20, 2017, 02:00:25 PM00
cmp bl, 0
js Negative
jz Zero
jns Positive
print "this is the 4t option", 13, 10
;)


Here is a nice one, the shr eax, 1 is Copyright DednDave:
  lea edi, msg
  .While 1
xor eax, eax
invoke GetMessage, edi, eax, eax, eax
inc eax
shr eax, 1
.Break .if Zero? ; 0 (OK) or -1 (error)
invoke TranslateMessage, edi
invoke DispatchMessage, edi
  .Endw

dedndave

hardly a copyright - lol

but, he won't understand the advantage until he has a better understading of the full instruction set

INC EAX is a single-byte instruction, typically taking 1 clock cycle
SHR EAX,1 is a two-byte instruction, also 1 clock

jj2007

Don't worry, Alex is learning very quickly :P

The clock cycles hardly matter in a GetMessage loop, but it's an elegant way to test for 0 and -1 simultaneously, that's why I posted it here.

include \masm32\include\masm32rt.inc ; plain Masm32

.code
start:
  mov ebx, -3
  .Repeat
print str$(ebx), 9
mov eax, ebx
inc eax
shr eax, 1
.if Zero?
print "ZERO", 13, 10
.else
print "non-zero", 13, 10
.endif
inc ebx
  .Until sdword ptr ebx>2
  inkey "ok?"
  exit
end start


Output:
-3      non-zero
-2      non-zero
-1      ZERO
0       ZERO
1       non-zero
2       non-zero

hutch--

This is still my preferred message loop in 32 bit code when you are not processing keystrokes.

MsgLoop proc

    LOCAL msg:MSG

    push ebx
    lea ebx, msg
    jmp getmsg

  msgloop:
    invoke TranslateMessage, ebx
    invoke DispatchMessage,  ebx
  getmsg:
    invoke GetMessage,ebx,0,0,0
    test eax, eax
    jnz msgloop

    pop ebx
    ret

MsgLoop endp


64 bit version does not look much different.

msgloop proc

    LOCAL msg    :MSG
    LOCAL pmsg   :QWORD

    mov pmsg, ptr$(msg)                     ; get the msg structure address
    jmp gmsg                                ; jump directly to GetMessage()

  mloop:
    invoke TranslateMessage,pmsg
    invoke DispatchMessage,pmsg
  gmsg:
    test rax, rv(GetMessage,pmsg,0,0,0)     ; loop until GetMessage returns zero
    jnz mloop

    ret

msgloop endp

mineiro

Hello sir LordAdef;
I read some of your sources and look good man, that river raid make me feel nostalgic time. Nice. I perceive that you have programming knowledge, you modeled your data so rle works fine.
On pass 3 you can read a dword data and compare with next dword data, so you need add 4 as displacement, and when they differs you do on byte way to normalize.

edited: I forgot to say, I tested yours examples on wine under linux and worked fine.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

LordAdef

Hi Mineiro!!

Thanks a lot!

QuoteI read some of your sources and look good man, that river raid make me feel nostalgic time.

Yes, River Raid was a fantastic game. In order to have a goal, I chose it as a project.

QuoteNice. I perceive that you have programming knowledge, you modeled your data so rle works fine

I do in fact, although I´ve never had a formal education. As most of us, I started with good old BASIC, then I moved onto a couple of other things. I must admit I feel like I wasted my time! Assembly was always something painted as a "don´t touch it" thing. I am in love and I think I will stuck with asm for good.

QuoteOn pass 3 you can read a dword data and compare with next dword data
This is actually a great idea!

I´m currently writing the decompression code to unroll my map back. Almost done! This is the one I need to be fast, since it´s going to be working at runtime in the proggy.

I´m doing it slow but steady. I want to finish this game at all costs.