News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

MACROs and PROCs - when to use?

Started by frktons, January 01, 2013, 12:35:16 AM

Previous topic - Next topic

frktons

I'm not sure when it is better to use a MACRO instead
of a PROC and the contrary.

Let's assume a MACRO has some code inside, about 20 instructions,
and that MACRO is used in 10 different places during the program
execution.

Does this mean the code is repeated 10 times? Or what?
Is it better in this case to use a PROC in order to shrink program
size, but with a little overhead due to the CALL mechanism?

And in general when would you use a MACRO, and why, vs a PROC?

Thanks and happy new year.

Frank
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

qWord

Macros are text replacement thus each call to a code producing macro will generate code.
There is no general rule when to inline code or not - that is your decision. Commonly you have to balance between code size and speed (as larger the code get, as smaller the advantage of inline code gets). Also note that code size (of whole prog..) isn't that important today as it was in the past.
MREAL macros - when you need floating point arithmetic while assembling!

nidud

#2
deleted

dedndave

macros are intended to save you some typing - and they do
many macros call functions, so don't think that the 2 are mutually exclusive
a good example is the "print" macro
the real advantage of that macro is the flexibility - not that it is fast or that it saves bytes
take a look at the print macro in macros.asm   :t
it seems very simple, but it may be used many ways

but, choosing whether to implement some code via a macro or via code may be a speed vs size decision

nidud....
    push    edi
    sub     eax,eax
    mov     edi,string
    or      ecx,-1
    repne   scasb
    sub     eax,2
    pop     edi
    sub     eax,ecx


:P

frktons

So if I need the smallest code, better to use a PROC
called from many paces. If I need a fast and flexible
solution a MACRO has to be considered.
Thanks friends, this is a good starting point.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

Vortex

Hi frktons,

Some tasks are archieved during assembly-time with macros. There are a lot of examples in the \masm32\macros folder.

jj2007

And the speed vs size tradeoff is not always so obvious:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 17/10 cycles

4448    cycles for 10 * JJ
4445    cycles for 10 * Dave
1398    cycles for 10 * crt_strlen
360     cycles for 10 * MasmBasic Len

4433    cycles for 10 * JJ
4435    cycles for 10 * Dave
1411    cycles for 10 * crt_strlen
362     cycles for 10 * MasmBasic Len

4429    cycles for 10 * JJ
4423    cycles for 10 * Dave
1373    cycles for 10 * crt_strlen
362     cycles for 10 * MasmBasic Len

15      bytes for JJ
16      bytes for Dave
11      bytes for crt_strlen
7       bytes for MasmBasic Len

100     = eax JJ
100     = eax Dave
100     = eax crt_strlen
100     = eax MasmBasic Len

dedndave

i know this is a bit off-topic   :P

Jochen - i wonder how they'd compare on short strings, say, 10 bytes in length

jj2007

Quote from: dedndave on January 01, 2013, 03:16:10 AM
Jochen - i wonder how they'd compare on short strings, say, 10 bytes in length

line 281: mov Src100[10], 0

4430    cycles for 10 * JJ
4427    cycles for 10 * Dave
1432    cycles for 10 * crt_strlen
360     cycles for 10 * MasmBasic Len

900     cycles for 10 * JJ
890     cycles for 10 * Dave
207     cycles for 10 * crt_strlen
154     cycles for 10 * MasmBasic Len

dedndave


jj2007

Quote from: dedndave on January 01, 2013, 03:56:04 AM
a bit surprising   :P

The first rows are for 100 bytes, the next ones for 10 bytes. There is a bit of overhead, so I don't find it that surprising; repne scasb is not the fastest, so I would not use it in an innermost loop with a high count...

dedndave

a little playing around...

if i have a string length of 0, i get about 88 cycles
if i comment out the REPNZ SCASB, i get about 1 or 2 cycles
well - that's a little off - we can figure about 7 or 8 cycles for the overhead code

point is - that first SCASB takes ~80 cycles
after that, it's about 4 cycles per rep on shorter strings

i suspect that's because it has to test the direction flag
you may remember our previous discussions about STD and CLD taking ~80 cycles

i think the processor is slow with the DF because of protected mode
maybe we could test that theory by booting up in real mode
not that there is anything to be gained by it - lol

dedndave

i wonder if things would be different in ring 0
seems to me that CPUID takes about 80 cycles, too
give you any ideas, Michael ?   :biggrin:

hutch--

Frank,

Macros are a convenience but much of programming  IS convenience but macros are more powerful than that, if you want to you can put procs in macros just as easily as putting macro calls in procs. qword is correct here in that code size almost does not matter, in assembler it is generally small enough anyway and often the miniscule gains you get are wasted with the final link controlled section alignment. Chasing code size is a left over from DOS COM files, in 64k total it mattered, with 4 gig address space and up to 2 gig memory allocation in 32 bit code, it simply does not matter. Write good clear fast code, forget the old DOS stuff.

jj2007

Quote from: dedndave on January 01, 2013, 05:48:48 AM
seems to me that CPUID takes about 80 cycles, too

Even more, it seems: 174 cycles on my Celeron.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 188/100 cycles

44097   cycles for 100 * JJ
43977   cycles for 100 * Dave
15090   cycles for 100 * crt_strlen
3678    cycles for 100 * MasmBasic Len    #### round 1, 100*100 bytes ####
17462   cycles for 100 * CPUID

5928    cycles for 100 * JJ
5828    cycles for 100 * Dave
1376    cycles for 100 * crt_strlen
1553    cycles for 100 * MasmBasic Len    #### round 2, 100*2 bytes ####
17459   cycles for 100 * CPUID