The MASM Forum

General => The Campus => Topic started by: Gwyn on February 20, 2013, 08:23:57 AM

Title: Problem with LOCAL variables
Post by: Gwyn on February 20, 2013, 08:23:57 AM
Before i ask my question i think i should let you know that i am beginner in masm and in programming altogether.
My problem is - as the title say, with local variables, they get constantly overwritten between messages in all of my small programs.
For example if i declare two local variables
LOCAL localx:DWORD
LOCAL localy:DWORD
then use them to store client area size during WM_SIZE message, the value in locals will change (will be overwritten) before it reaches WM_PAINT message, resulting code will paint itself outside client area, obviously.
Usually i would think that the problem is with me, but the same program runs as expected if i declare those variables as global.
Just changing code into this
.DATA
localx DWORD 0
localy DWORD 0
will make everything work fine. I made sure that the values originally stored are fine, and that later they get overwritten by loading programs into OllyDbg.
As i said that happens with all of my programs, and i couldn't find any solution to it, except declaring global variables.
The question is of course why is this happening.

I would attach an example but the only one i currently have is using high resolution .BMP image in resource, and is 20MB large.
Title: Re: Problem with LOCAL variables
Post by: RuiLoureiro on February 20, 2013, 08:41:31 AM
Hi Gwyn
               it never happens with me. It is strange to me.
               I think the problem may be in the register EBP
used to access those variables. Is it preserved ?
               Strange is also what you said: that happens with all of my programs!

               Another question: how do you store client area size during WM_SIZE message ? could you show what are you doing ?
               Well i never used local variables in window procedures
Title: Re: Problem with LOCAL variables
Post by: Vortex on February 20, 2013, 08:58:53 AM
Hi Gwyn,

Your results are normal. Nothing unexpected. Local variables are always local to their host procedure and the scope is not global. Uninitialized local variables will contain garbage values from the stack. You need to store your critical variables in the .data or .data? section.

A quick example : you can check Iczelion's Simple Bitmap tutorial :

.data?
hBitmap dd ?
.
.
WndProc proc hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
   LOCAL ps:PAINTSTRUCT
   LOCAL hdc:HDC
   LOCAL hMemDC:HDC
   LOCAL rect:RECT
   .if uMsg==WM_CREATE
      invoke LoadBitmap,hInstance,IDB_MAIN
      mov hBitmap,eax


hBitmap is stored in the uninitialized section .data?  No guaranty for the local variables in the procedure to survive across consecutive calls.

http://win32assembly.programminghorizon.com/tut25.html
Title: Re: Problem with LOCAL variables
Post by: Gwyn on February 20, 2013, 09:41:02 AM
Quote from: Vortex on February 20, 2013, 08:58:53 AM
Hi Gwyn,
No guaranty for the local variables in the procedure to survive across consecutive calls.
I don't understand completly, so you are saying that it's normal that values stored during WM_SIZE message in local variables, won't be the same in WM_PAINT message. I thought that the scope in which the value of local variables will be preserved is the whole procedure (for example WndProc), and now you are saying that this just applies to calls between messages.
Could you confirm that?
Because i did not know that, the thing that confused me was the Petzold's book examples were he does exactly that, so i thought that i was doing something wrong when it didn't work for me.

@ RuiLoureiro

Here is the code in attachment , it's simple mapping mode example where i change mapping mode to MM_ISOTROPIC and display bitmap(2048×3072) from resource in the center of the client area.
Title: Re: Problem with LOCAL variables
Post by: MichaelW on February 20, 2013, 11:39:10 AM
I don't know what Petzold book you are referring to, but in C for a local variable to be preserved between calls you have to declare it with the storage class specifier static, which causes the compiler to store it in the uninitialized (BSS) data section, but continue to limit the scope of the variable to within the procedure.

#include <windows.h>
#include <conio.h>
#include <stdio.h>
int test1( void )
{
    int i;
    int r = i;
    i = 123;
    return r;
}
int test2( void )
{
    static int i;
    int r = i;
    i = 123;
    return r;
}
int main( void )
{
    printf("%d\n", test1());
    printf("%d\n", test1());
    printf("%d\n", test2());
    printf("%d\n", test2());
    getch();
}


200084144
4198483
0
123


_r$ = -8 ; size = 4
_i$ = -4 ; size = 4
_test1 PROC NEAR
; File c:\program files\microsoft visual c++ toolkit 2003\my\static\test.c
; Line 5
push ebp
mov ebp, esp
sub esp, 8
; Line 7
mov eax, DWORD PTR _i$[ebp]
mov DWORD PTR _r$[ebp], eax
; Line 8
mov DWORD PTR _i$[ebp], 123 ; 0000007bH
; Line 9
mov eax, DWORD PTR _r$[ebp]
; Line 10
mov esp, ebp
pop ebp
ret 0
_test1 ENDP
. . .
_BSS SEGMENT
?i@?1??test2@@9@9 DD 01H DUP (?) ; `test2'::`2'::i
; Function compile flags: /Odt
_BSS ENDS
_TEXT SEGMENT
_r$ = -4 ; size = 4
_test2 PROC NEAR
; Line 12
push ebp
mov ebp, esp
push ecx
; Line 14
mov eax, DWORD PTR ?i@?1??test2@@9@9
mov DWORD PTR _r$[ebp], eax
; Line 15
mov DWORD PTR ?i@?1??test2@@9@9, 123 ; 0000007bH
; Line 16
mov eax, DWORD PTR _r$[ebp]
; Line 17
mov esp, ebp
pop ebp
ret 0
_test2 ENDP


With MASM you have to put it in the data section, and accept a global scope.
Title: Re: Problem with LOCAL variables
Post by: dedndave on February 20, 2013, 11:40:48 AM
local variable contents are volatile
they only remain valid for the instance of a single call to the routine

when the WM_SIZE message is received, that is one instance
when the WM_PAINT message is received, that is another instance
each message is a seperate instance
this can be overcome by using global variables (in the .DATA? or .DATA section)

in this particular case, you may not need to store the information
when you call the BeginPaint function, it fills a PAINTSTRUCT structure
part of that structure is a RECT rectangle structure that describes what part of the client area needs to be drawn
Title: Re: Problem with LOCAL variables
Post by: Gwyn on February 20, 2013, 12:27:32 PM
Quote from: MichaelW on February 20, 2013, 11:39:10 AM
I don't know what Petzold book you are referring to, but in C for a local variable to be preserved between calls you have to declare it with the storage class specifier static, which causes the compiler to store it in the uninitialized (BSS) data section, but continue to limit the scope of the variable to within the procedure.
I got it now, yes he did use static for those declarations, but because i don't know C i didn't know there is any difference between the two of them, i just thought they are regular local variables, and that is why i naively used regular local variables in masm and tried to do the same.
Thanks everybody for their posts, i can say i understand now how to properly use local variables.
Title: Re: Problem with LOCAL variables
Post by: RuiLoureiro on February 21, 2013, 05:02:27 AM
Gwyn,

    «Thanks everybody for their posts,
    i can say i understand now how to properly use local variables»

    . So i think your problem is solved.

      Meanwhile i want to say this:

        1. I use local variables very very rarely;
           This is why i have no problems with them!           
        2. I never use it in window procedures.
           And if we want to use it there
           we can use inside one message and
           not from one to another;
           There is no problem with LOCAL ps:PAINTSTRUCT
            because it is memory to be used by
            BeginPaint/EndPaint only when uMsg==WM_PAINT.           
        3. Local variables should be initialized.
           They are not 0 ! They are something !
        4. Why to declare it static if i can define
           it in the data section ?

    . Your problem is with LOCAL pos:POINT.
      When the system calls WndProc with uMsg==WM_SIZE
      you save pos.y and pos.x. But when the system
      comes again with uMsg==WM_PAINT, pos.y and pos.x
      is any value nothing to do with the previous
Title: Re: Problem with LOCAL variables
Post by: dedndave on February 21, 2013, 05:49:27 AM
QuoteI use local variables very very rarely;
This is why i have no problems with them!
a lot of C programmers tell us to always use LOCALS, never use GLOBALS - lol
they are both tools - use the right tool for the job

QuoteI never use it in window procedures.
i try to avoid using LOCAL's in WndProc, as well
the way i do it is....
    .if uMsg==WM_PAINT
        INVOKE  PaintProc,hWnd
        xor     eax,eax

then, i put the LOCAL's in the PaintProc

QuoteWhy to declare it static if i can define it in the data section ?
i thought that's what a "static" variable (C term) is in assembler
Title: Re: Problem with LOCAL variables
Post by: qWord on February 21, 2013, 05:57:11 AM
Quote from: RuiLoureiro on February 21, 2013, 05:02:27 AM1. I use local variables very very rarely;
that is very very unwise! Beside the straightforward scope, it can also be assumed that locals are always cached.
All over, is very simple: if you need to save global states, use global variables* - otherwise use locals.

Quote from: RuiLoureiro on February 21, 2013, 05:02:27 AM4. Why to declare it static if i can define
           it in the data section ?
static == variable in data section with local scope.(?)


* assuming a single threaded environment.
Title: Re: Problem with LOCAL variables
Post by: RuiLoureiro on February 21, 2013, 06:14:31 AM
Dave,
    «a lot of C programmers tell us to always use LOCALS, never use GLOBALS»

    .   generally i dont need it, no LOCALS, no GLOBALS

qWord,
    « that is very very unwise!
    Beside the straightforward scope, it can also be assumed that locals are always cached.»

    .   to get the function that a local variable do inside a proc
        i try to use other tricks. Meanwhile it is not easy to write local
        variables when we want to use esp to access the stack (not ebp).
Title: Re: Problem with LOCAL variables
Post by: dedndave on February 21, 2013, 06:17:56 AM
Rui, this may be related to that other program being 230 kb   :biggrin:
but, i suspect you just have some data declared in the .DATA section that could be in .DATA?
Title: Re: Problem with LOCAL variables
Post by: RuiLoureiro on February 21, 2013, 06:25:42 AM
Dave, Yes i have a lot of data in .data section.
And yes some of them could be in .data?. One day i
will do that work !  ;)
Title: Re: Problem with LOCAL variables
Post by: Vortex on February 21, 2013, 06:25:54 AM
QuoteMeanwhile it is not easy to write local variables when we want to use esp to access the stack (not ebp).

With a little care, it's possible.
Title: Re: Problem with LOCAL variables
Post by: RuiLoureiro on February 21, 2013, 06:29:32 AM
Quote from: Vortex on February 21, 2013, 06:25:54 AM
QuoteMeanwhile it is not easy to write local variables when we want to use esp to access the stack (not ebp).

With a little care, it's possible.
Yes i know and i do
Title: Re: Problem with LOCAL variables
Post by: jj2007 on February 21, 2013, 11:30:59 AM
Quote from: qWord on February 21, 2013, 05:57:11 AMit can also be assumed that locals are always cached

Is that a valid argument, given that you have to write to them before you use them?
Title: Re: Problem with LOCAL variables
Post by: MichaelW on February 21, 2013, 11:53:57 AM
Even assuming that the processors don't provide preferential caching of the stack, the active stack would likely be cached because it's frequently accessed.
Title: Re: Problem with LOCAL variables
Post by: jj2007 on February 21, 2013, 12:00:25 PM
Right, the garbage that is in your local variables is probably cached.

But the good values that you have in global variables is also probably cached, and you don't have to write to the global memory each and every time before using them. So they are faster on average.
Title: Re: Problem with LOCAL variables
Post by: MichaelW on February 21, 2013, 12:15:26 PM
Good point, but consider that caching also works for write accesses. So for locals the initial write would likely be cached, where for the first of a localized group of globals the initial read would likely be uncached. Which is faster would depend on your global access patterns, but I agree that initialized globals are likely to be faster on average.
Title: Re: Problem with LOCAL variables
Post by: jj2007 on February 21, 2013, 12:31:12 PM
Quote from: MichaelW on February 21, 2013, 12:15:26 PM
Good point, but consider that caching also works for write accesses. So for locals the initial write would likely be cached, where for the first of a group of globals the initial read would likely be uncached.

For a standard GUI app, my trusty Celeron's L1 cache of 32kb will be sufficient to keep all global variables in the cache. But all local variables, with no exception, must be written not only to the cache but also to physical memory each and every time you want to use them because what is cached of them is garbage.
Title: Re: Problem with LOCAL variables
Post by: qWord on February 21, 2013, 12:47:42 PM
Quote from: jj2007 on February 21, 2013, 12:31:12 PMFor a standard GUI app, my trusty Celeron's L1 cache of 32kb will be sufficient to keep all global variables in the cache.
... and all that stuff from the other processes and threads.

Quote from: jj2007 on February 21, 2013, 12:31:12 PMBut all local variables, with no exception, must be written not only to the cache but also to physical memory each and every time you want to use them because what is cached of them is garbage.
no, IIRC we commonly have write-back catch for user-land memory, thus a write back to phy. mem. only occurs if the catch is full, some kind of synchronization applies or the catch control decides that the region is no longer needed.
Title: Re: Problem with LOCAL variables
Post by: jj2007 on February 21, 2013, 01:26:02 PM
Quote from: qWord on February 21, 2013, 12:47:42 PM
Quote from: jj2007 on February 21, 2013, 12:31:12 PMFor a standard GUI app, my trusty Celeron's L1 cache of 32kb will be sufficient to keep all global variables in the cache.
... and all that stuff from the other processes and threads.

Basically, when you context switch, all of the memory addresses that the processor "remembers" in it's cache effectively become useless.
(http://stackoverflow.com/questions/5440128/thread-context-switch-vs-process-context-switch)

Quote
Quote from: jj2007 on February 21, 2013, 12:31:12 PMBut all local variables, with no exception, must be written not only to the cache but also to physical memory each and every time you want to use them because what is cached of them is garbage.
no, IIRC we commonly have write-back catch for user-land memory, thus a write back to phy. mem. only occurs if the catch is full, some kind of synchronization applies or the catch control decides that the region is no longer needed.

"a write back to phy. mem. only occurs if the catch cache is full" may be correct but is irrelevant. Your LOCAL rc:RECT in the WndProc may be at a different address every time you write to it. Remember that on entry to the WndProc, esp varies.
Title: Re: Problem with LOCAL variables
Post by: MichaelW on February 21, 2013, 01:36:40 PM
Quote from: jj2007 on February 21, 2013, 01:26:02 PM
Your LOCAL rc:RECT in the WndProc may be at a different address every time you write to it. Remember that on entry to the WndProc, esp varies.

A different address, but still likely a cached address.

Title: Re: Problem with LOCAL variables
Post by: qWord on February 21, 2013, 02:58:32 PM
Quote from: jj2007 on February 21, 2013, 01:26:02 PMBasically, when you context switch, all of the memory addresses that the processor "remembers" in it's cache effectively become useless. (http://stackoverflow.com/questions/5440128/thread-context-switch-vs-process-context-switch)
That make not much sense, because caches also works with physical address and not only with virtual addresses. Also, looking in Intel's manuals, you will see that there are solutions for the problem of different address spaces.
Even thought that theory would make the large caches that are nowadays used become useless, because that would mean the whole cache needs to be copied for each of the thousands context switches per second.
Title: Re: Problem with LOCAL variables
Post by: dedndave on February 21, 2013, 03:19:25 PM
this is a bit beyond the campus   :P

but, i would think the internal cache is somehow shared between contexts
an external cache can be switched with the context by changing page table entries
Title: Re: Problem with LOCAL variables
Post by: MichaelW on February 21, 2013, 05:42:58 PM
If the first use for a local is as a storage destination, as you would use a local PAINTSTRUCT for example, then there is no initialization penalty. This code compares the access times for globals and locals,  sequential access and random access:

;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
;--------------------------------------------------------
; This is an assembly-time random number generator based
; on code by George Marsaglia:
;   #define znew  ((z=36969*(z&65535)+(z>>16))<<16)
;   #define wnew  ((w=18000*(w&65535)+(w>>16))&65535)
;   #define MWC   (znew+wnew)
;--------------------------------------------------------
@znew_seed@ = 362436069
@wnew_seed@ = 521288629

@rnd MACRO base:REQ
    LOCAL znew, wnew
    @znew_seed@ = 36969 * (@znew_seed@ AND 65535) + (@znew_seed@ SHR 16)
    znew = @znew_seed@ SHL 16
    @wnew_seed@ = 18000 * (@wnew_seed@ AND 65535) + (@wnew_seed@ SHR 16)
    wnew = @wnew_seed@ AND 65535
    EXITM <(znew + wnew) MOD base>
ENDM
;==============================================================================
ARRAY_SIZE equ 100
;==============================================================================
.data
    array1  dd ARRAY_SIZE dup(?)
.code
;==============================================================================

align 4
globals_seq proc
    lea esi, array1
    xor ecx, ecx
  @@:
    mov eax, [esi+ecx*4]
    inc ecx
    cmp ecx, ARRAY_SIZE
    jb  @B
    ret
globals_seq endp

align 4
locals_seq proc
    LOCAL array2[ARRAY_SIZE]:DWORD
    lea esi, array2
    xor ecx, ecx
  @@:
    mov eax, [esi+ecx*4]
    inc ecx
    cmp ecx, ARRAY_SIZE
    jb  @B
    ret
locals_seq endp

align 4
globals_rnd proc
    lea esi, array1
    REPEAT ARRAY_SIZE
        mov eax, @rnd(ARRAY_SIZE)
        mov eax, [esi+eax*4]
    ENDM
    ret
globals_rnd endp

align 4
locals_rnd proc
    LOCAL array2[ARRAY_SIZE]:DWORD
    lea ebx, array2
    REPEAT ARRAY_SIZE
        mov eax, @rnd(ARRAY_SIZE)
        mov eax, [esi+eax*4]
    ENDM
    ret
locals_rnd endp

;==============================================================================
start:
;==============================================================================
    invoke GetCurrentProcess
    invoke SetProcessAffinityMask, eax, 1

    REPEAT 100
        mov eax, @rnd(ARRAY_SIZE)
        printf("%d\t",eax)
    ENDM
    printf("\n")

    invoke Sleep, 5000

    REPEAT 3

        counter_begin 10000000, HIGH_PRIORITY_CLASS
        counter_end
        printf("%d cycles, empty\n", eax)

        counter_begin 10000000, HIGH_PRIORITY_CLASS
            call globals_seq
        counter_end
        printf("%d cycles, globals_seq\n", eax)

        counter_begin 10000000, HIGH_PRIORITY_CLASS
            call locals_seq
        counter_end
        printf("%d cycles, locals_seq\n", eax)

        counter_begin 10000000, HIGH_PRIORITY_CLASS
            call globals_rnd
        counter_end
        printf("%d cycles, globals_rnd\n", eax)

        counter_begin 10000000, HIGH_PRIORITY_CLASS
            call locals_rnd
        counter_end
        printf("%d cycles, locals_rnd\n\n", eax)

    ENDM

    inkey
    exit
;==============================================================================
end start


Running on a P3 (Katmai):

0 cycles, empty
310 cycles, globals_seq
213 cycles, locals_seq
100 cycles, globals_rnd
102 cycles, locals_rnd

0 cycles, empty
311 cycles, globals_seq
213 cycles, locals_seq
100 cycles, globals_rnd
102 cycles, locals_rnd

0 cycles, empty
310 cycles, globals_seq
213 cycles, locals_seq
100 cycles, globals_rnd
102 cycles, locals_rnd


Running on a P4 (Northwood):

1 cycles, empty
219 cycles, globals_seq
221 cycles, locals_seq
88 cycles, globals_rnd
89 cycles, locals_rnd

1 cycles, empty
219 cycles, globals_seq
221 cycles, locals_seq
88 cycles, globals_rnd
96 cycles, locals_rnd

1 cycles, empty
219 cycles, globals_seq
221 cycles, locals_seq
88 cycles, globals_rnd
89 cycles, locals_rnd


For the P3, increasing the array size to 200 elements or dropping it to 30 elements increased the cycle count for the local array sequential access, putting it close to the count for the global array, I suspect because of caching effects.


Title: Re: Problem with LOCAL variables
Post by: jj2007 on February 21, 2013, 10:21:07 PM
I've looked at it from a different angle:
- local vars, need to be initialised
- global vars, either static or assigned in proc
- random stack as in a normal WndProc

Results:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 433/200 cycles

2081    cycles for 200 * local
979     cycles for 200 * global no init
1068    cycles for 200 * global w init
19630   cycles for 200 * local, random stack
18900   cycles for 200 * global w init, random stack
18689   cycles for 200 * global no init, random stack

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3
loop overhead is approx. 355/200 cycles

1697    cycles for 200 * local
1059    cycles for 200 * global no init
1265    cycles for 200 * global w init
18163   cycles for 200 * local, random stack
16680   cycles for 200 * global w init, random stack
16893   cycles for 200 * global no init, random stack


The "random stack" results are distorted because nrandom is very slow. There is an option useMB=1 that switches to MasmBasic's fast Rand(). Example:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 356/200 cycles

1696    cycles for 200 * local
1058    cycles for 200 * global no init
1264    cycles for 200 * global w init
5281    cycles for 200 * local, random stack
4295    cycles for 200 * global w init, random stack
4731    cycles for 200 * global no init, random stack
Title: Re: Problem with LOCAL variables
Post by: qWord on February 21, 2013, 11:17:24 PM
Quote from: MichaelW on February 21, 2013, 05:42:58 PMThis code compares the access times for globals and locals,  sequential access and random access:
If I modifie the test bed, thus the Sleep in moved into the REPAT loop, the loop count is ten and the repetition count of the counter-macro is one, I get the following results, (Intel i7 3610QM):
Press any key to continue ...
94 50 70 51 17 51 90 81 74 73 92 11 25 99 90 40 31 23 25 64 64 57 99 27 18 40 59 0 49 3 23 85 30 7 13 86 10 12 31 49 71 1 30 58 71 24 40 46 50 84 84 26 1 9 81 29 10 80 49 82 58 18 38 59 61 64 31 42 18 20 27 11 21 74 73 6 42 33 98 28 60 57 62 88 89 5 76 75 1 29 47 91 95 24 35 38 17 63 32 38

-134 cycles, empty

546 cycles, globals_seq

186 cycles, locals_seq

906 cycles, globals_rnd

210 cycles, locals_rnd



-32 cycles, empty

938 cycles, globals_seq

389 cycles, locals_seq

1608 cycles, globals_rnd

453 cycles, locals_rnd



0 cycles, empty

1244 cycles, globals_seq

435 cycles, locals_seq

1288 cycles, globals_rnd

354 cycles, locals_rnd



7 cycles, empty

1334 cycles, globals_seq

561 cycles, locals_seq

1270 cycles, globals_rnd

338 cycles, locals_rnd



0 cycles, empty

904 cycles, globals_seq

577 cycles, locals_seq

1603 cycles, globals_rnd

373 cycles, locals_rnd



30 cycles, empty

1113 cycles, globals_seq

621 cycles, locals_seq

1247 cycles, globals_rnd

704 cycles, locals_rnd



-30 cycles, empty

1012 cycles, globals_seq

423 cycles, locals_seq

805 cycles, globals_rnd

522 cycles, locals_rnd



-44 cycles, empty

989 cycles, globals_seq

614 cycles, locals_seq

1270 cycles, globals_rnd

989 cycles, locals_rnd



-2 cycles, empty

952 cycles, globals_seq

550 cycles, locals_seq

858 cycles, globals_rnd

492 cycles, locals_rnd



-53 cycles, empty

948 cycles, globals_seq

573 cycles, locals_seq

1290 cycles, globals_rnd

522 cycles, locals_rnd


We should not blend out cache misses by using high loop counts.

the result for the unmodified test:
94      50      70      51      17      51      90      81      74      73
92      11      25      99      90      40      31      23      25      64
64      57      99      27      18      40      59      0       49      3
23      85      30      7       13      86      10      12      31      49
71      1       30      58      71      24      40      46      50      84
84      26      1       9       81      29      10      80      49      82
58      18      38      59      61      64      31      42      18      20
27      11      21      74      73      6       42      33      98      28
60      57      62      88      89      5       76      75      1       29
47      91      95      24      35      38      17      63      32      38

-5 cycles, empty
112 cycles, globals_seq
112 cycles, locals_seq
33 cycles, globals_rnd
34 cycles, locals_rnd

0 cycles, empty
113 cycles, globals_seq
112 cycles, locals_seq
34 cycles, globals_rnd
34 cycles, locals_rnd

0 cycles, empty
112 cycles, globals_seq
112 cycles, locals_seq
33 cycles, globals_rnd
34 cycles, locals_rnd

Press any key to continue ...
Title: Re: Problem with LOCAL variables
Post by: dedndave on February 22, 2013, 02:51:15 AM
globals can be faster, because we often use absolute-direct addressing with them
    mov     eax,GlobalVar
not to mention, when a local is created, the routine has to adjust the stack - lol

passing the address of a global can also be faster - you push a constant
for a local, the assembler uses LEA, then PUSH EAX
PUSH EAX is fast, but the fact that you also have an LEA hurts
Title: Re: Problem with LOCAL variables
Post by: MichaelW on February 22, 2013, 04:31:22 AM
I think this thread may need to be split.

I used an assembly-time RNG so I could avoid an overhead count of 20-30 times the count for the array access.

The stack adjustment to make room for the locals is fast.

And I failed to consider that the high loop count would "blend out" cache misses, when the only thing I can see that could make locals faster is a reduction in cache misses.

This is a quick modification of the cycle count macros to allow control of the thread priority, the idea being to get consistent counts in only a small number of loops, and hopefully only one loop. I cannot reasonably run at the highest possible priority on my P3, and on my P4 w HT I had to use a loop count of 100 to get even reasonable consistency. Perhaps the newer processors, with multiple physical cores, can do better.

  ; ----------------------------------------------------------------------
  ; These two macros perform the grunt work involved in measuring the
  ; processor clock cycle count for a block of code. These macros must
  ; be used in pairs, and the block of code must be placed in between
  ; the counter_begin and counter_end macro calls. The counter_end macro
  ; returns the clock cycle count for a single pass through the block of
  ; code, corrected for the test loop overhead, in EAX.
  ;
  ; These macros require a .586 or higher processor directive.
  ;
  ; The essential differences between these macros and the prvious macros
  ; are that these save and restore the original priorities, and provide
  ; a way to control the thread priority. Control of the thread priority
  ; allows timing code at the highest possible priority by combining
  ; REALTIME_PRIORITY_CLASS with THREAD_PRIORITY_TIME_CRITICAL.
  ;
  ; Note that running at the higher priority settings on a single core
  ; processor involves some risk, as it will cause your process to
  ; preempt *all* other processes, including critical Windows processes.
  ; Using HIGH_PRIORITY_CLASS in combination with THREAD_PRIORITY_NORMAL
  ; should generally be safe.
  ; ----------------------------------------------------------------------

    counter_begin MACRO loopcount:REQ, process_priority:REQ, thread_priority
        LOCAL label

        IFNDEF __counter__qword__count__
          .data
          ALIGN 8             ;; Optimal alignment for QWORD
            __counter__qword__count__  dq 0
            __counter__loop__count__   dd 0
            __counter__loop__counter__ dd 0
            __process_priority_class__ dd 0
            __thread_priority__        dd 0
            __current_process__        dd 0
            __current_thread__         dd 0
          .code
        ENDIF

        mov __counter__loop__count__, loopcount
        invoke GetCurrentProcess
        mov __current_process__, eax
        invoke GetPriorityClass, __current_process__
        mov __process_priority_class__, eax
        invoke SetPriorityClass, __current_process__, process_priority
        IFNB <thread_priority>
            invoke GetCurrentThread
            mov _current_thread__, eax
            invoke GetThreadPriority, _current_thread__
            mov __thread_priority__, eax
            invoke SetThreadPriority, _current_thread__, thread_priority
        ENDIF
        xor eax, eax          ;; Use same CPUID input value for each call
        cpuid                 ;; Flush pipe & wait for pending ops to finish
        rdtsc                 ;; Read Time Stamp Counter

        push edx              ;; Preserve high-order 32 bits of start count
        push eax              ;; Preserve low-order 32 bits of start count
        mov   __counter__loop__counter__, loopcount
        xor eax, eax
        cpuid                 ;; Make sure loop setup instructions finish
      ALIGN 16                ;; Optimal loop alignment for P6
      @@:                     ;; Start an empty reference loop
        sub __counter__loop__counter__, 1
        jnz @B

        xor eax, eax
        cpuid                 ;; Make sure loop instructions finish
        rdtsc                 ;; Read end count
        pop ecx               ;; Recover low-order 32 bits of start count
        sub eax, ecx          ;; Low-order 32 bits of overhead count in EAX
        pop ecx               ;; Recover high-order 32 bits of start count
        sbb edx, ecx          ;; High-order 32 bits of overhead count in EDX
        push edx              ;; Preserve high-order 32 bits of overhead count
        push eax              ;; Preserve low-order 32 bits of overhead count

        xor eax, eax
        cpuid
        rdtsc
        push edx              ;; Preserve high-order 32 bits of start count
        push eax              ;; Preserve low-order 32 bits of start count
        mov   __counter__loop__counter__, loopcount
        xor eax, eax
        cpuid                 ;; Make sure loop setup instructions finish
      ALIGN 16                ;; Optimal loop alignment for P6
      label:                  ;; Start test loop
        __counter__loop__label__ equ <label>
    ENDM

    counter_end MACRO
        sub __counter__loop__counter__, 1
        jnz  __counter__loop__label__

        xor eax, eax
        cpuid                 ;; Make sure loop instructions finish
        rdtsc                 ;; Read end count
        pop ecx               ;; Recover low-order 32 bits of start count
        sub eax, ecx          ;; Low-order 32 bits of test count in EAX
        pop ecx               ;; Recover high-order 32 bits of start count
        sbb edx, ecx          ;; High-order 32 bits of test count in EDX
        pop ecx               ;; Recover low-order 32 bits of overhead count
        sub eax, ecx          ;; Low-order 32 bits of adjusted count in EAX
        pop ecx               ;; Recover high-order 32 bits of overhead count
        sbb edx, ecx          ;; High-order 32 bits of adjusted count in EDX

        mov DWORD PTR __counter__qword__count__, eax
        mov DWORD PTR __counter__qword__count__ + 4, edx

        invoke SetPriorityClass,__current_process__,__process_priority_class__
        IFNB <thread_priority>
            invoke SetThreadPriority, __current_thread__, __thread_priority__
        ENDIF

        finit
        fild __counter__qword__count__
        fild __counter__loop__count__
        fdiv
        fistp __counter__qword__count__

        mov eax, DWORD PTR __counter__qword__count__
    ENDM


I think altering the priority for each loop may not be the best way to do it, because there is no knowing what hoops Windows may be jumping through to do this. Perhaps altering the priority at startup would produce better results.

Edit:

Added the:

mov __thread_priority__, eax


That I left out.



Title: Re: Problem with LOCAL variables
Post by: qWord on February 22, 2013, 06:17:28 AM
Quote from: MichaelW on February 22, 2013, 04:31:22 AMAnd I failed to consider that the high loop count would "blend out" cache misses, when the only thing I can see that could make locals faster is a reduction in cache misses.
After the first round everything is cached for all variants. From this point on we are measuring the plain execution time of instructions and not the memory access - Isn't this what we want to measure when proving the postulation "it can also be assumed that locals are always cached"?
Looking on my result with your (unmodified) test, we can see that at least on my i7 there no difference between the global and local variant (the same as for your P4) - that's why I think we should make one-shoot measurements.