The MASM Forum

General => The Campus => Topic started by: frktons on January 01, 2013, 12:35:16 AM

Title: MACROs and PROCs - when to use?
Post by: frktons on January 01, 2013, 12:35:16 AM
I'm not sure when it is better to use a MACRO instead
of a PROC and the contrary.

Let's assume a MACRO has some code inside, about 20 instructions,
and that MACRO is used in 10 different places during the program
execution.

Does this mean the code is repeated 10 times? Or what?
Is it better in this case to use a PROC in order to shrink program
size, but with a little overhead due to the CALL mechanism?

And in general when would you use a MACRO, and why, vs a PROC?

Thanks and happy new year.

Frank
Title: Re: MACROs and PROCs - when to use?
Post by: qWord on January 01, 2013, 12:46:02 AM
Macros are text replacement thus each call to a code producing macro will generate code.
There is no general rule when to inline code or not - that is your decision. Commonly you have to balance between code size and speed (as larger the code get, as smaller the advantage of inline code gets). Also note that code size (of whole prog..) isn't that important today as it was in the past.
Title: Re: MACROs and PROCs - when to use?
Post by: nidud on January 01, 2013, 01:39:38 AM
deleted
Title: Re: MACROs and PROCs - when to use?
Post by: dedndave on January 01, 2013, 02:08:09 AM
macros are intended to save you some typing - and they do
many macros call functions, so don't think that the 2 are mutually exclusive
a good example is the "print" macro
the real advantage of that macro is the flexibility - not that it is fast or that it saves bytes
take a look at the print macro in macros.asm   :t
it seems very simple, but it may be used many ways

but, choosing whether to implement some code via a macro or via code may be a speed vs size decision

nidud....
    push    edi
    sub     eax,eax
    mov     edi,string
    or      ecx,-1
    repne   scasb
    sub     eax,2
    pop     edi
    sub     eax,ecx


:P
Title: Re: MACROs and PROCs - when to use?
Post by: frktons on January 01, 2013, 02:23:22 AM
So if I need the smallest code, better to use a PROC
called from many paces. If I need a fast and flexible
solution a MACRO has to be considered.
Thanks friends, this is a good starting point.
Title: Re: MACROs and PROCs - when to use?
Post by: Vortex on January 01, 2013, 02:54:08 AM
Hi frktons,

Some tasks are archieved during assembly-time with macros. There are a lot of examples in the \masm32\macros folder.
Title: Re: MACROs and PROCs - when to use?
Post by: jj2007 on January 01, 2013, 03:12:51 AM
And the speed vs size tradeoff is not always so obvious:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 17/10 cycles

4448    cycles for 10 * JJ
4445    cycles for 10 * Dave
1398    cycles for 10 * crt_strlen
360     cycles for 10 * MasmBasic Len

4433    cycles for 10 * JJ
4435    cycles for 10 * Dave
1411    cycles for 10 * crt_strlen
362     cycles for 10 * MasmBasic Len

4429    cycles for 10 * JJ
4423    cycles for 10 * Dave
1373    cycles for 10 * crt_strlen
362     cycles for 10 * MasmBasic Len

15      bytes for JJ
16      bytes for Dave
11      bytes for crt_strlen
7       bytes for MasmBasic Len

100     = eax JJ
100     = eax Dave
100     = eax crt_strlen
100     = eax MasmBasic Len
Title: Re: MACROs and PROCs - when to use?
Post by: dedndave on January 01, 2013, 03:16:10 AM
i know this is a bit off-topic   :P

Jochen - i wonder how they'd compare on short strings, say, 10 bytes in length
Title: Re: MACROs and PROCs - when to use?
Post by: jj2007 on January 01, 2013, 03:38:23 AM
Quote from: dedndave on January 01, 2013, 03:16:10 AM
Jochen - i wonder how they'd compare on short strings, say, 10 bytes in length

line 281: mov Src100[10], 0

4430    cycles for 10 * JJ
4427    cycles for 10 * Dave
1432    cycles for 10 * crt_strlen
360     cycles for 10 * MasmBasic Len

900     cycles for 10 * JJ
890     cycles for 10 * Dave
207     cycles for 10 * crt_strlen
154     cycles for 10 * MasmBasic Len
Title: Re: MACROs and PROCs - when to use?
Post by: dedndave on January 01, 2013, 03:56:04 AM
a bit surprising   :P
Title: Re: MACROs and PROCs - when to use?
Post by: jj2007 on January 01, 2013, 04:37:35 AM
Quote from: dedndave on January 01, 2013, 03:56:04 AM
a bit surprising   :P

The first rows are for 100 bytes, the next ones for 10 bytes. There is a bit of overhead, so I don't find it that surprising; repne scasb is not the fastest, so I would not use it in an innermost loop with a high count...
Title: Re: MACROs and PROCs - when to use?
Post by: dedndave on January 01, 2013, 05:11:20 AM
a little playing around...

if i have a string length of 0, i get about 88 cycles
if i comment out the REPNZ SCASB, i get about 1 or 2 cycles
well - that's a little off - we can figure about 7 or 8 cycles for the overhead code

point is - that first SCASB takes ~80 cycles
after that, it's about 4 cycles per rep on shorter strings

i suspect that's because it has to test the direction flag
you may remember our previous discussions about STD and CLD taking ~80 cycles

i think the processor is slow with the DF because of protected mode
maybe we could test that theory by booting up in real mode
not that there is anything to be gained by it - lol
Title: Re: MACROs and PROCs - when to use?
Post by: dedndave on January 01, 2013, 05:48:48 AM
i wonder if things would be different in ring 0
seems to me that CPUID takes about 80 cycles, too
give you any ideas, Michael ?   :biggrin:
Title: Re: MACROs and PROCs - when to use?
Post by: hutch-- on January 02, 2013, 12:43:21 AM
Frank,

Macros are a convenience but much of programming  IS convenience but macros are more powerful than that, if you want to you can put procs in macros just as easily as putting macro calls in procs. qword is correct here in that code size almost does not matter, in assembler it is generally small enough anyway and often the miniscule gains you get are wasted with the final link controlled section alignment. Chasing code size is a left over from DOS COM files, in 64k total it mattered, with 4 gig address space and up to 2 gig memory allocation in 32 bit code, it simply does not matter. Write good clear fast code, forget the old DOS stuff.
Title: Re: MACROs and PROCs - when to use?
Post by: jj2007 on January 02, 2013, 01:21:34 AM
Quote from: dedndave on January 01, 2013, 05:48:48 AM
seems to me that CPUID takes about 80 cycles, too

Even more, it seems: 174 cycles on my Celeron.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 188/100 cycles

44097   cycles for 100 * JJ
43977   cycles for 100 * Dave
15090   cycles for 100 * crt_strlen
3678    cycles for 100 * MasmBasic Len    #### round 1, 100*100 bytes ####
17462   cycles for 100 * CPUID

5928    cycles for 100 * JJ
5828    cycles for 100 * Dave
1376    cycles for 100 * crt_strlen
1553    cycles for 100 * MasmBasic Len    #### round 2, 100*2 bytes ####
17459   cycles for 100 * CPUID
Title: Re: MACROs and PROCs - when to use?
Post by: MichaelW on January 02, 2013, 01:46:16 AM
In my test it depended on the CPUID function number in EAX, with the higher numbers requiring more cycles, up through 3 IIRC where it leveled out.
Title: Re: MACROs and PROCs - when to use?
Post by: frktons on January 02, 2013, 10:13:28 AM
Interesting data, and intriguing strlen:

#if defined(_MSC_VER) && (_MSC_VER != 1400 || !defined(ppc)) && (_MSC_VER != 1310) &&
(_MSC_VER != 1200) && defined(_UNICODE)
#pragma function(wcslen)
#endif

/* Calculate the length of the string pointed by pStr (excluding the
* terminating '\0')
*/
size_t _tcslen(const _TCHAR *pStr)
{
    const _TCHAR *pEnd;

    for (pEnd = pStr; *pEnd != _TEXT('\0'); pEnd++)
        continue;

    return pEnd - pStr;
}
©2004 Microsoft Corporation. All rights reserved.

Being about 7:1 faster than repne scasb means it is well
optimized by the C compiler. And I can't figure it How?
Title: Re: MACROs and PROCs - when to use?
Post by: jj2007 on January 02, 2013, 11:16:22 AM
Quote from: frktons on January 02, 2013, 10:13:28 AM
Interesting data, and intriguing strlen:

Being about 7:1 faster than repne scasb means it is well optimized by the C compiler. And I can't figure it How?

CRT strlen is only about 3 times faster than repne scasb, and it's no mystery (ecx=pString):
77C178C0        8B01                 mov eax, [ecx]
77C178C2        BA FFFEFE7E          mov edx, 7EFEFEFF
77C178C7        03D0                 add edx, eax
77C178C9        83F0 FF              xor eax, FFFFFFFF
77C178CC        33C2                 xor eax, edx
77C178CE        83C1 04              add ecx, 4
77C178D1        A9 00010181          test eax, 81010100
77C178D6       74 E8                je short 77C178C0
77C178D8        8B41 FC              mov eax, [ecx-4]
77C178DB        84C0                 test al, al
77C178DD       74 32                je short 77C17911
77C178DF        84E4                 test ah, ah
77C178E1       74 24                je short 77C17907
77C178E3        A9 0000FF00          test eax, 00FF0000
77C178E8       74 13                je short 77C178FD
77C178EA        A9 000000FF          test eax, FF000000
77C178EF       74 02                je short 77C178F3
77C178F1       EB CD                jmp short 77C178C0
77C178F3        8D41 FF              lea eax, [ecx-1]
77C178F6        8B4C24 04            mov ecx, [esp+4]
77C178FA        2BC1                 sub eax, ecx
77C178FC        C3                   retn
Title: Re: MACROs and PROCs - when to use?
Post by: frktons on January 02, 2013, 11:31:31 AM
I was talking about the performance on my pc:
Quote
Intel(R) Core(TM)2 CPU  E6600  @ 2.40GHz (SSE4)
loop overhead is approx. 17/10 cycles

6617    cycles for 10 * JJ
6607    cycles for 10 * Dave
937     cycles for 10 * crt_strlen
221     cycles for 10 * MasmBasic Len

6602    cycles for 10 * JJ
6619    cycles for 10 * Dave
852     cycles for 10 * crt_strlen
220     cycles for 10 * MasmBasic Len

6611    cycles for 10 * JJ
6605    cycles for 10 * Dave
866     cycles for 10 * crt_strlen
220     cycles for 10 * MasmBasic Len

You are probably using SSE2 code to parallellize the check
for zero byte, while strlen uses probably a 4 bytes at the same time
approach, the trick of the holes.

mov edx, 7EFEFEFF

Of course if you have better code inside a MACRO it'll be faster
than its counterpart in a PROC, using a single byte approach.  :t