News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

SwitchToThread vs Sleep(0)

Started by jj2007, May 11, 2013, 08:17:48 AM

Previous topic - Next topic

qWord

Quote from: Antariy on May 15, 2013, 12:38:18 PMHmm... Interesting. Maybe some systems return not CPU freq but some scale value from this, thank you, qWord! :t Changed code now to calculate CPU freq at runtime, probably it will work better that relying on an API ::)
window's performance counters are implemented with the APIC (Advanced Programmable Interrupt Controller) - the frequency is independent from the CPU's freq..
MREAL macros - when you need floating point arithmetic while assembling!

Antariy

Interesting to check how it behaves - which values we get - before and after the changement of a timer resolution.

Edited.

Quote from: qWord on May 15, 2013, 12:54:26 PM
Quote from: Antariy on May 15, 2013, 12:38:18 PMHmm... Interesting. Maybe some systems return not CPU freq but some scale value from this, thank you, qWord! :t Changed code now to calculate CPU freq at runtime, probably it will work better that relying on an API ::)
window's performance counters are implemented with the APIC (Advanced Programmable Interrupt Controller) - the frequency is independent from the CPU's freq..

Not always independed. Well, yes, my info is rusty :lol: Probably it is because of standard kernel (under kernel I mean "HAL" and "kernel" in a sheaf) on a single-core machine ::)


At least we probably can assume that the "default" 15,625 ms on NT's are more or less the maximum, better to fit the timings measurement to something like 10 ms.

jj2007

Hi Alex,
mw blocked my one-core PC completely - I had to press power off for five seconds... :(

MichaelW

Quote from: jj2007 on May 15, 2013, 03:45:39 PM
Hi Alex,
mw blocked my one-core PC completely - I had to press power off for five seconds... :(

I thought I included a warning. For my P3 I use HIGH_PRIORITY_CLASS and THREAD_PRIORITY_NORMAL or THREAD_PRIORITY_ABOVE_NORMAL, but for my P4 with HT I can max out the priority, no problems.
Well Microsoft, here's another nice mess you've gotten us into.

Antariy

Quote from: jj2007 on May 15, 2013, 03:45:39 PM
Hi Alex,
mw blocked my one-core PC completely - I had to press power off for five seconds... :(

Sorry, Jochen :(

Antariy

Jochen, did you try in such a circumstance to press and hold [Ctrl]+[C], this should terminate program in some seconds (10~20)? This will help if the console prog hangs, but if some OS service/something such crashed/hanged, then nothing will help. (This may sound strange or funny, but once I got a freeze when hold left mouse button on a page in an Acrobat Reader (version 7), slowly scrolling the page by holding it with a "hand". The CPU usage is very high at this moment - something strange happened to an OS.)

jj2007

Quote from: Antariy on May 15, 2013, 10:57:56 PM
Jochen, did you try in such a circumstance to press and hold [Ctrl]+[C]...

Don't worry, Alex, I had no files open, and no data were lost.
Ctrl C was what I tried first, but it was really blocked completely - no mouse, no keyboard reaction.

dedndave

#52
well, it probably needs some fine-tuning, but this demonstrates my serialization concept

it's based on a "natural" serialization of the code stream
i.e., rather than using one of the serializing instructions (per intel),
force the code stream to serialize based on register content
    _serializ MACRO
        pushad
        pop     edi
        pop     esi
        pop     ebp
        pop     eax
        pop     ebx
        pop     edx
        pop     eax
        pop     ecx
        xchg    eax,ecx
    ENDM

the CPU can't perform out-of-order execution if it has to wait for the sequence to finish   :biggrin:
all registers are involved, so it has to wait
as a matter of coincidence, the sequence is completely benign - no registers or flags are modified
hopefully, the time it takes to execute our sequence is more stable/repeatable than CPUID

as i said, we need to do some fine-tuning
but, here is a sample run
i do still get outliers, in spite of the single-quantum execution
11 8 5 7 8 7 8 10 8 8
24 25 22 22 23 21 24 21 24 26
56 55 57 56 55 55 56 57 53 56


but - i think we have a nice starting place

EDIT: attachment updated - see reply #57

FORTRANS

Hi,

   Just looked at the Intel manuals I have.  Not in the serializing
section, but in the atomic operations section, it says that those
will serialize things as well.  Have you considered using an XCHG
Reg,Mem or the like to serialize things?  Just curious.  It seems
that it should work better than a CPUID.

Regards,

Steve N.

jj2007

Quote from: FORTRANS on May 17, 2013, 02:44:17 AM
Have you considered using an XCHG Reg,Mem or the like to serialize things?

Steve,
Thanks, indeed that was one of the first things I tested, but no real difference. Besides, cpuid seems to be the "official" way to serialise.

FORTRANS

Hi,

   Okay, I had not noticed that you had looked at that.  And
yes, CPUID seems to be the code of choice in the samples I
have seen.

Thanks,

Steve N.

Gunther

Hi Steve,

Quote from: FORTRANS on May 17, 2013, 04:13:27 AM
And yes, CPUID seems to be the code of choice in the samples I have seen.

yes, CPUID seems to be the best choice. Agner Fog recommends it, too.

Gunther
You have to know the facts before you can distort them.

dedndave

it would seem that CALL/RET does a fair job
C:\Masm32\Asm32 => dTime2
4 5 4 6 4 5 4 4 4 5
277 276 280 277 278 275 277 279 277 278
83 82 83 82 83 82 78 83 84 81
Press any key to continue ...

C:\Masm32\Asm32 => dTime2
3 3 4 3 3 3 6 4 4 3
275 279 280 277 277 276 279 275 281 275
82 83 83 83 83 84 82 84 83 81
Press any key to continue ...

dedndave

#58
        INVOKE  dTime,CodeToMeasure1,HIGH_PRIORITY_CLASS,THREAD_PRIORITY_ABOVE_NORMAL
;***********************************************************************************************

dTime   PROC USES EBX ESI EDI lpfnProc:LPVOID,dwPriClass:DWORD,dwPriLevel:DWORD

;Code Timing Function, David R. Sheldon - DednDave, Ver 1.1, May 2013

;--------------------------------------------------

;Call With: lpfnProc   = address of function to be timed
;           dwPriClass = process priority class
;           dwPriLevel = thread priority level
;
;  Returns: EAX        = clock cycles (not including function CALL/RET overhead)
;
;Also Uses: ECX, EDX, all other registers are preserved
;
;    Notes: 1) The function referenced by lpfnProc may not have any arguments.
;              It may, however, contain local variables. The time consumed creating and
;              destroying the stack frame will be included in the timing measurement.
;           2) The function referenced by lpfnProc may destroy any register contents,
;              but must balance the stack (ESP) before RETurn, of course.

;--------------------------------------------------

;Process priority class       Thread priority level     Base priority
;
;IDLE_PRIORITY_CLASS          THREAD_PRIORITY_IDLE            1
;                             THREAD_PRIORITY_LOWEST          2
;                             THREAD_PRIORITY_BELOW_NORMAL    3
;                             THREAD_PRIORITY_NORMAL          4
;                             THREAD_PRIORITY_ABOVE_NORMAL    5
;                             THREAD_PRIORITY_HIGHEST         6
;                             THREAD_PRIORITY_TIME_CRITICAL  15
;
;BELOW_NORMAL_PRIORITY_CLASS  THREAD_PRIORITY_IDLE            1
;                             THREAD_PRIORITY_LOWEST          4
;                             THREAD_PRIORITY_BELOW_NORMAL    5
;                             THREAD_PRIORITY_NORMAL          6
;                             THREAD_PRIORITY_ABOVE_NORMAL    7
;                             THREAD_PRIORITY_HIGHEST         8
;                             THREAD_PRIORITY_TIME_CRITICAL  15
;
;NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_IDLE            1
;                             THREAD_PRIORITY_LOWEST          6
;                             THREAD_PRIORITY_BELOW_NORMAL    7
;                             THREAD_PRIORITY_NORMAL          8
;                             THREAD_PRIORITY_ABOVE_NORMAL    9
;                             THREAD_PRIORITY_HIGHEST        10
;                             THREAD_PRIORITY_TIME_CRITICAL  15
;
;ABOVE_NORMAL_PRIORITY_CLASS  THREAD_PRIORITY_IDLE            1
;                             THREAD_PRIORITY_LOWEST          8
;                             THREAD_PRIORITY_BELOW_NORMAL    9
;                             THREAD_PRIORITY_NORMAL         10
;                             THREAD_PRIORITY_ABOVE_NORMAL   11
;                             THREAD_PRIORITY_HIGHEST        12
;                             THREAD_PRIORITY_TIME_CRITICAL  15
;
;HIGH_PRIORITY_CLASS          THREAD_PRIORITY_IDLE            1
;                             THREAD_PRIORITY_LOWEST         11
;                             THREAD_PRIORITY_BELOW_NORMAL   12
;                             THREAD_PRIORITY_NORMAL         13
;                             THREAD_PRIORITY_ABOVE_NORMAL   14
;                             THREAD_PRIORITY_HIGHEST        15
;                             THREAD_PRIORITY_TIME_CRITICAL  15
;
;REALTIME_PRIORITY_CLASS      THREAD_PRIORITY_IDLE           16
;                             THREAD_PRIORITY_LOWEST         22
;                             THREAD_PRIORITY_BELOW_NORMAL   23
;                             THREAD_PRIORITY_NORMAL         24
;                             THREAD_PRIORITY_ABOVE_NORMAL   25
;                             THREAD_PRIORITY_HIGHEST        26
;                             THREAD_PRIORITY_TIME_CRITICAL  31

;**************************************************

;local variables

;--------------------------------------------------

        LOCAL   _hProcess     :HANDLE
        LOCAL   _hThread      :HANDLE
        LOCAL   _dwPriClass   :DWORD
        LOCAL   _dwPriLevel   :DWORD
        LOCAL   _dwAffinity   :DWORD
        LOCAL   _dwTerminalHi :DWORD
        LOCAL   _dwTerminalLo :DWORD
        LOCAL   _dwTallyHi    :DWORD
        LOCAL   _dwTallyLo    :DWORD
        LOCAL   _dwPassCount  :DWORD

;**************************************************

;initialization

;--------------------------------------------------

        INVOKE  GetCurrentProcess
        mov     _hProcess,eax
        INVOKE  GetPriorityClass,eax
        mov     _dwPriClass,eax
        INVOKE  GetCurrentThread
        mov     _hThread,eax
        INVOKE  GetThreadPriority,eax
        mov     _dwPriLevel,eax
        INVOKE  GetProcessAffinityMask,_hProcess,addr _dwAffinity,addr _dwPassCount
        INVOKE  SetProcessAffinityMask,_hProcess,1
        rdtsc
        mov     esi,eax
        mov     edi,edx
        INVOKE  Sleep,125
        rdtsc
        xor     ecx,ecx
        mov     _dwTerminalLo,eax
        mov     _dwTerminalHi,edx
        mov     _dwPassCount,ecx
        sub     eax,esi
        sbb     edx,edi
        mov     _dwTallyLo,ecx
        shld    edx,eax,2
        shl     eax,2
        mov     _dwTallyHi,ecx
        add     _dwTerminalLo,eax
        adc     _dwTerminalHi,edx
        INVOKE  Sleep,ecx
        mov     edi,offset DummyProc
        call    SinglePass
        mov     edi,offset DummyProc
        call    SinglePass
        mov     edi,offset DummyProc
        call    SinglePass
        jmp short TopOfLoop

;**************************************************

;measurement single pass

;--------------------------------------------------

;EDI = proc address

        ALIGN   16

SinglePass:
        INVOKE  SetPriorityClass,_hProcess,dwPriClass
        INVOKE  Sleep,0               ;bind new priority
        INVOKE  SetThreadPriority,_hThread,dwPriLevel
        INVOKE  Sleep,0               ;bind new level
        INVOKE  Sleep,0               ;fresh slice
        rdtsc
        push    edx                   ;Ta
        push    eax
        push    ebp
        call    edi                   ;proc to be measured
        rdtsc
        pop     ebp
        push    edx                   ;Tb
        push    eax
        INVOKE  SetPriorityClass,_hProcess,_dwPriClass
        INVOKE  Sleep,0               ;bind new priority
        INVOKE  SetThreadPriority,_hThread,_dwPriLevel
        INVOKE  Sleep,0               ;bind new level
        pop     eax
        pop     edx
        pop     esi
        pop     edi
        mov     ecx,eax
        mov     ebx,edx               ;EBX:ECX = last tsc reading
        sub     eax,esi
        sbb     edx,edi               ;EDX:EAX = measured time
        retn

;**************************************************

;empty proc

;--------------------------------------------------

        ALIGN   16

DummyProc:
        retn

;**************************************************

;measurement loop

;--------------------------------------------------

TopOfLoop:
        mov     edi,offset DummyProc  ;empty proc for reference
        call    SinglePass
        push    edx
        push    eax
        mov     edi,lpfnProc          ;code to be measured
        call    SinglePass
        pop     esi
        pop     edi
        sub     eax,esi
        sbb     edx,edi
        inc dword ptr _dwPassCount
        add     _dwTallyLo,eax
        adc     _dwTallyHi,edx
        cmp     ebx,_dwTerminalHi
        jb      TopOfLoop

        ja      dTally

        cmp     ecx,_dwTerminalLo
        jb      TopOfLoop

;**************************************************

;tally results and exit

;--------------------------------------------------

dTally: INVOKE  SetProcessAffinityMask,_hProcess,_dwAffinity
        mov     edx,_dwTallyHi
        mov     eax,_dwTallyLo
        or      edx,edx
        mov     ecx,_dwPassCount
        jns     Tdivis

        xor     eax,eax
        jmp short dTExit

Tdivis: cmp     ecx,1
        jbe     dTExit

        div     ecx
        shl     edx,1
        cmp     edx,ecx
        sbb     eax,-1

dTExit: ret

dTime   ENDP

;***********************************************************************************************

Antariy

Hi Dave :t
For dtime2:

44 36 30 28 4 7 24 35 0 28
356 468 415 386 405 354 384 431 384 376
48 58 77 144 76 84 103 101 77 79
Press any key to continue ...


Can you build and post the full code of your previous post?