Print Page - MACROs and PROCs

Title: MACROs and PROCs - when to use?
Post by: frktons on January 01, 2013, 12:35:16 AM

I'm not sure when it is better to use a MACRO instead
of a PROC and the contrary.

Let's assume a MACRO has some code inside, about 20 instructions,
and that MACRO is used in 10 different places during the program
execution.

Does this mean the code is repeated 10 times? Or what?
Is it better in this case to use a PROC in order to shrink program
size, but with a little overhead due to the CALL mechanism?

And in general when would you use a MACRO, and why, vs a PROC?

Thanks and happy new year.

Frank

Title: Re: MACROs and PROCs - when to use?
Post by: qWord on January 01, 2013, 12:46:02 AM

Macros are text replacement thus each call to a code producing macro will generate code.
There is no general rule when to inline code or not - that is your decision. Commonly you have to balance between code size and speed (as larger the code get, as smaller the advantage of inline code gets). Also note that code size (of whole prog..) isn't that important today as it was in the past.

Title: Re: MACROs and PROCs - when to use?
Post by: nidud on January 01, 2013, 01:39:38 AM

deleted

Title: Re: MACROs and PROCs - when to use?
Post by: dedndave on January 01, 2013, 02:08:09 AM

macros are intended to save you some typing - and they do
many macros call functions, so don't think that the 2 are mutually exclusive
a good example is the "print" macro
the real advantage of that macro is the flexibility - not that it is fast or that it saves bytes
take a look at the print macro in macros.asm :t
it seems very simple, but it may be used many ways

but, choosing whether to implement some code via a macro or via code may be a speed vs size decision

nidud....

Code Select

    push    edi
    sub     eax,eax
    mov     edi,string
    or      ecx,-1
    repne   scasb
    sub     eax,2
    pop     edi
    sub     eax,ecx

Title: Re: MACROs and PROCs - when to use?
Post by: frktons on January 01, 2013, 02:23:22 AM

So if I need the smallest code, better to use a PROC
called from many paces. If I need a fast and flexible
solution a MACRO has to be considered.
Thanks friends, this is a good starting point.

Title: Re: MACROs and PROCs - when to use?
Post by: Vortex on January 01, 2013, 02:54:08 AM

Hi frktons,

Some tasks are archieved during assembly-time with macros. There are a lot of examples in the \masm32\macros folder.

Title: Re: MACROs and PROCs - when to use?
Post by: jj2007 on January 01, 2013, 03:12:51 AM

And the speed vs size tradeoff is not always so obvious:

Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
loop overhead is approx. 17/10 cycles

4448 cycles for 10 * JJ
4445 cycles for 10 * Dave
1398 cycles for 10 * crt_strlen
360 cycles for 10 * MasmBasic Len

4433 cycles for 10 * JJ
4435 cycles for 10 * Dave
1411 cycles for 10 * crt_strlen
362 cycles for 10 * MasmBasic Len

4429 cycles for 10 * JJ
4423 cycles for 10 * Dave
1373 cycles for 10 * crt_strlen
362 cycles for 10 * MasmBasic Len

15 bytes for JJ
16 bytes for Dave
11 bytes for crt_strlen
7 bytes for MasmBasic Len

100 = eax JJ
100 = eax Dave
100 = eax crt_strlen
100 = eax MasmBasic Len

Title: Re: MACROs and PROCs - when to use?
Post by: dedndave on January 01, 2013, 03:16:10 AM

i know this is a bit off-topic :P

Jochen - i wonder how they'd compare on short strings, say, 10 bytes in length

Title: Re: MACROs and PROCs - when to use?
Post by: jj2007 on January 01, 2013, 03:38:23 AM

Quote from: dedndave on January 01, 2013, 03:16:10 AM
Jochen - i wonder how they'd compare on short strings, say, 10 bytes in length

line 281: mov Src100[10], 0

4430 cycles for 10 * JJ
4427 cycles for 10 * Dave
1432 cycles for 10 * crt_strlen
360 cycles for 10 * MasmBasic Len

900 cycles for 10 * JJ
890 cycles for 10 * Dave
207 cycles for 10 * crt_strlen
154 cycles for 10 * MasmBasic Len

Title: Re: MACROs and PROCs - when to use?
Post by: dedndave on January 01, 2013, 03:56:04 AM

a bit surprising :P

Title: Re: MACROs and PROCs - when to use?
Post by: jj2007 on January 01, 2013, 04:37:35 AM

Quote from: dedndave on January 01, 2013, 03:56:04 AM
a bit surprising :P

The first rows are for 100 bytes, the next ones for 10 bytes. There is a bit of overhead, so I don't find it that surprising; repne scasb is not the fastest, so I would not use it in an innermost loop with a high count...

Title: Re: MACROs and PROCs - when to use?
Post by: dedndave on January 01, 2013, 05:11:20 AM

a little playing around...

if i have a string length of 0, i get about 88 cycles
if i comment out the REPNZ SCASB, i get about 1 or 2 cycles
well - that's a little off - we can figure about 7 or 8 cycles for the overhead code

point is - that first SCASB takes ~80 cycles
after that, it's about 4 cycles per rep on shorter strings

i suspect that's because it has to test the direction flag
you may remember our previous discussions about STD and CLD taking ~80 cycles

i think the processor is slow with the DF because of protected mode
maybe we could test that theory by booting up in real mode
not that there is anything to be gained by it - lol

Title: Re: MACROs and PROCs - when to use?
Post by: dedndave on January 01, 2013, 05:48:48 AM

i wonder if things would be different in ring 0
seems to me that CPUID takes about 80 cycles, too
give you any ideas, Michael ? :biggrin:

Title: Re: MACROs and PROCs - when to use?
Post by: hutch-- on January 02, 2013, 12:43:21 AM

Frank,

Macros are a convenience but much of programming IS convenience but macros are more powerful than that, if you want to you can put procs in macros just as easily as putting macro calls in procs. qword is correct here in that code size almost does not matter, in assembler it is generally small enough anyway and often the miniscule gains you get are wasted with the final link controlled section alignment. Chasing code size is a left over from DOS COM files, in 64k total it mattered, with 4 gig address space and up to 2 gig memory allocation in 32 bit code, it simply does not matter. Write good clear fast code, forget the old DOS stuff.

Title: Re: MACROs and PROCs - when to use?
Post by: jj2007 on January 02, 2013, 01:21:34 AM

Quote from: dedndave on January 01, 2013, 05:48:48 AM
seems to me that CPUID takes about 80 cycles, too

Even more, it seems: 174 cycles on my Celeron.

Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
loop overhead is approx. 188/100 cycles

44097 cycles for 100 * JJ
43977 cycles for 100 * Dave
15090 cycles for 100 * crt_strlen
3678 cycles for 100 * MasmBasic Len #### round 1, 100*100 bytes ####
17462 cycles for 100 * CPUID

5928 cycles for 100 * JJ
5828 cycles for 100 * Dave
1376 cycles for 100 * crt_strlen
1553 cycles for 100 * MasmBasic Len #### round 2, 100*2 bytes ####
17459 cycles for 100 * CPUID

Title: Re: MACROs and PROCs - when to use?
Post by: MichaelW on January 02, 2013, 01:46:16 AM

In my test it depended on the CPUID function number in EAX, with the higher numbers requiring more cycles, up through 3 IIRC where it leveled out.

Title: Re: MACROs and PROCs - when to use?
Post by: frktons on January 02, 2013, 10:13:28 AM

Interesting data, and intriguing strlen:

Code Select


#if defined(_MSC_VER) && (_MSC_VER != 1400 || !defined(ppc)) && (_MSC_VER != 1310) && 
(_MSC_VER != 1200) && defined(_UNICODE)
#pragma function(wcslen)
#endif

/* Calculate the length of the string pointed by pStr (excluding the
 * terminating '\0')
 */
size_t _tcslen(const _TCHAR *pStr)
{
    const _TCHAR *pEnd;

    for (pEnd = pStr; *pEnd != _TEXT('\0'); pEnd++)
        continue;

    return pEnd - pStr;
}
©2004 Microsoft Corporation. All rights reserved.

Being about 7:1 faster than repne scasb means it is well
optimized by the C compiler. And I can't figure it How?

Title: Re: MACROs and PROCs - when to use?
Post by: jj2007 on January 02, 2013, 11:16:22 AM

Quote from: frktons on January 02, 2013, 10:13:28 AM
Interesting data, and intriguing strlen:

Being about 7:1 faster than repne scasb means it is well optimized by the C compiler. And I can't figure it How?

CRT strlen is only about 3 times faster than repne scasb, and it's no mystery (ecx=pString):
77C178C0 8B01 mov eax, [ecx]
77C178C2 BA FFFEFE7E mov edx, 7EFEFEFF
77C178C7 03D0 add edx, eax
77C178C9 83F0 FF xor eax, FFFFFFFF
77C178CC 33C2 xor eax, edx
77C178CE 83C1 04 add ecx, 4
77C178D1 A9 00010181 test eax, 81010100
77C178D6 74 E8 je short 77C178C0
77C178D8 8B41 FC mov eax, [ecx-4]
77C178DB 84C0 test al, al
77C178DD 74 32 je short 77C17911
77C178DF 84E4 test ah, ah
77C178E1 74 24 je short 77C17907
77C178E3 A9 0000FF00 test eax, 00FF0000
77C178E8 74 13 je short 77C178FD
77C178EA A9 000000FF test eax, FF000000
77C178EF 74 02 je short 77C178F3
77C178F1 EB CD jmp short 77C178C0
77C178F3 8D41 FF lea eax, [ecx-1]
77C178F6 8B4C24 04 mov ecx, [esp+4]
77C178FA 2BC1 sub eax, ecx
77C178FC C3 retn

Title: Re: MACROs and PROCs - when to use?
Post by: frktons on January 02, 2013, 11:31:31 AM

I was talking about the performance on my pc:

Quote
Intel(R) Core(TM)2 CPU E6600 @ 2.40GHz (SSE4)
loop overhead is approx. 17/10 cycles

6617 cycles for 10 * JJ
6607 cycles for 10 * Dave
937 cycles for 10 * crt_strlen
221 cycles for 10 * MasmBasic Len

6602 cycles for 10 * JJ
6619 cycles for 10 * Dave
852 cycles for 10 * crt_strlen
220 cycles for 10 * MasmBasic Len

6611 cycles for 10 * JJ
6605 cycles for 10 * Dave
866 cycles for 10 * crt_strlen
220 cycles for 10 * MasmBasic Len

You are probably using SSE2 code to parallellize the check
for zero byte, while strlen uses probably a 4 bytes at the same time
approach, the trick of the holes.

Code Select


 mov edx, 7EFEFEFF

Of course if you have better code inside a MACRO it'll be faster
than its counterpart in a PROC, using a single byte approach. :t

The MASM Forum

General => The Campus => Topic started by: frktons on January 01, 2013, 12:35:16 AM