The MASM Forum

General => The Laboratory => Topic started by: jj2007 on May 11, 2013, 08:17:48 AM

Title: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 11, 2013, 08:17:48 AM
I am playing with the question how to get the #cycles in a single time slice:

  mov ebx, 200
  .Repeat
   invoke SwitchToThread   ; get new slice
   pushad
   cpuid   ; serialise
   popad
   rdtsc
   push eax
   push edx
   if mode eq 1
      invoke Sleep, 0
   elseif mode eq 2
      invoke SwitchToThread
   elseif mode eq 3
      push 0
      .Repeat
         dec dword ptr [esp]
      .Until Sign?
      pop eax
   else
      ; do nothing
   endif
   rdtsc
   pop edx
   pop ecx
   sub eax, ecx
   ;sub eax, 11   ; AMD: sub 11
   print str$(eax), " "
   dec ebx
  .Until Sign?


Results:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Mode Sleep 0    2724 1356 1128 1044 1128 1104 1068 1008 1044 1044 1056 1044 1104 1128 1104 1032 1044 1020
1056 1044 1044 1140 1164 1140 1080 1080 1092 1056 1068 1056 1140 1164 1176 1080 1080 1092 1044 1080 1080
1176 1140 1140 1056 1044 1056 1056 1080 1068 1128 1152 1152 1044 1056 1068 1068 1068 1068 1116 1128 1140
1068 1044 1068 1056 1044 1080 1116 1116 1116 1044 1080 1044 1068 1056 1068 1128 1140 1128 1056 1104 1068
1044 1044 1044 1152 1152 1116 1068 1068 1068 1032 1068 1056 1128 1140 1116 1068 1068 1056 1068 1068 1044
1128 1116 1128 1032 1032 1020 1044 1068 1044 1104 1116 1104 1080 1068 1044 1056 1068 1068 7128 1140 1152
1056 1044 1056 1032 1056 1056 1128 1152 1152 1068 1032 1032 1032 1056 1044 1104 1116 1152 1056 1056 1044
1056 1032 1032 1104 1128 1128 1092 1044 1068 1044 1056 1032 1116 1116 1152 1044 1056 1044 1056 1032 1056
1104 1140 1128 1032 1044 1044 1044 1056 1044 1128 1128 1116 1056 1032 1020 1020 1020 1032 1116 1128 1164
1056 1056 1044 1044 1032 1032 1116 1116 1116 1056 1056 1044 1044 1044 1068

Mode SwitchToThread     480 456 468 444 456 456 456 456 492 504 468 468 444 456 456 456 444 504 480 468 4
56 456 456 444 456 456 492 492 456 456 444 456 456 456 456 504 492 468 444 456 444 456 456 456 504 504 46
8 444 456 456 456 444 456 492 492 456 456 444 456 444 456 444 504 480 456 456 444 456 456 456 444 504 504
456 456 456 456 456 456 456 504 492 456 456 456 456 456 456 456 504 480 456 444 456 456 456 456 444 504
492 456 456 456 456 456 456 456 504 492 468 456 456 444 456 456 444 504 480 456 456 456 444 444 456 444 4
92 492 468 456 456 456 444 444 444 492 480 456 456 456 456 444 456 444 504 480 456 444 444 456 444 456 44
4 492 492 456 456 444 456 444 456 444 492 480 456 456 456 456 456 456 456 492 480 456 456 444 456 444 456
456 492 480 468 456 444 456 456 456 456 492 480 468 456 456 456 456 456 456 492 492 468 444

Mode loop       120 96 96 96 108 96 96 96 96 108 108 108 108 96 96 108 108 108 108 108 96 108 108 96 108
96 96 96 96 96 96 96 96 96 96 96 108 108 108 108 96 108 108 96 108 96 96 96 108 108 96 96 108 96 108 108
108 96 108 108 96 108 108 108 108 108 96 96 108 96 108 96 96 96 96 96 96 108 108 108 96 108 96 96 96 108
108 96 96 96 108 108 108 108 96 108 96 108 96 108 108 108 96 108 108 108 96 96 96 96 96 108 96 96 108 108
96 108 108 96 108 96 108 96 96 96 108 108 96 96 96 108 96 108 96 108 108 96 96 96 96 108 108 108 96 108
108 108 96 108 108 96 96 108 108 96 108 96 96 96 96 96 96 108 108 96 108 96 96 108 96 108 96 108 108 108
108 108 96 108 108 108 96 108 108 108 108 96 108 108 108 96 96 96 108 96 96 96 108 96 96

Mode nothing    96 108 96 96 96 96 96 96 96 96 96 96 108 108 96 96 96 108 96 108 96 108 96 96 108 96 96 1
08 96 96 96 96 96 108 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 108 96 96 108 96 96 96
96 96 96 96 96 108 96 108 96 96 108 96 108 96 96 96 108 96 96 96 96 108 96 96 108 96 96 96 96 96 96 96 96
96 96 108 96 96 96 96 108 96 96 96 108 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96
96 96 96 96 108 96 96 96 96 108 96 96 96 108 96 96 96 96 96 96 96 96 96 96 96 108 96 108 96 108 96 96 96
96 108 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 108 96 96 108 96 96 96 96 96 96 108 96 96 96 96 96 1
08 96 96 96 96 96 108


Does it make any sense? On my other puter the count went down to a minimum of 10.
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 11, 2013, 09:32:12 AM
Hi Jochen! :t


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
Mode Sleep 0    9864 5552 7800 9760 8288 4336 3944 4472 8296 4672 3128 3912 4112
4080 3696 3560 3792 3528 4224 3952 3520 3800 3472 4200 3896 3768 3520 3384 3520
3480 3400 3392 3400 3520 3520 3432 4040 3424 3944 3784 3416 3544 3536 3760 3384
4032 3536 3440 4280 3880 3520 3872 3416 7384 3968 3872 5944 4096 5456 7344 4064
6984 3984 3584 3848 3504 3832 3456 3840 3920 4288 3848 3960 4240 3960 3400 3568
3472 4096 3824 3080 3960 4160 4024 4096 3872 3408 3400 3112 3112 3112 4080 3512
3424 3600 3488 5496 4048 3864 4184 3432 3552 3736 4504 3624 3824 3976 3560 3600
3984 3448 3480 3384 7256 3960 4136 3416 3496 3480 66288 6024 4384 5536 8640 551
2 6656 4040 11104 4096 3824 9272 3984 3520 3464 3608 3752 3960 3600 4168 3768 35
36 3424 3736 3624 3896 3792 3456 3384 3880 3016 3400 3472 3400 3816 3992 3768 40
88 3928 3800 3552 3632 3448 3072 3352 3104 3648 3264 3472 4008 3464 3856 4056 38
64 5440 4136 3528 3432 6928 3808 3632 3456 3696 4088 4008 3904 3824 3992 4336 35
20 3448 3432 5024 4632 3440 3336 4392 3992 3856 3856 3856 3456

Mode SwitchToThread     21352 1504 888 888 888 888 888 888 888 888 888 888 888 8
88 888 888 888 1064 888 1048 888 888 888 888 888 888 1144 888 888 1112 1048 888
1104 888 1144 888 888 888 888 1048 888 888 888 1064 1048 888 1048 888 888 888 10
40 888 888 888 888 1064 888 1064 888 1040 888 888 888 888 1184 888 1064 888 1064
888 888 888 888 1048 888 1064 888 1120 888 1064 1048 888 888 888 1048 888 1040
888 1048 888 1064 1040 1048 888 888 888 888 1080 888 888 22016 309032 1136 1048
1048 888 888 888 1064 888 888 888 888 1064 888 1040 888 1104 888 888 888 1136 10
48 888 888 888 1064 1048 888 888 888 1064 40128 1040 1048 888 888 888 1080 888 8
88 888 888 888 888 1040 1064 888 888 888 888 888 888 1048 888 888 888 888 1064 1
160 888 1048 888 1040 1064 1064 888 888 888 888 888 888 1040 888 1064 888 1152 8
88 1120 1040 888 888 888 888 888 888 888 888 888 1064 1048 1064 888 1040 1048 12
88 920 888 888 888 888

Mode loop       96 104 96 104 96 104 104 96 104 104 96 104 104 96 104 104 104 10
4 104 104 96 104 104 96 96 96 96 96 104 96 96 96 104 96 96 104 104 104 104 104 1
04 104 96 104 104 96 96 104 104 104 104 96 96 104 96 96 96 96 96 104 96 96 96 10
4 104 104 96 104 104 96 96 104 96 104 96 104 96 104 96 96 96 96 104 104 104 96 9
6 104 96 104 96 96 96 104 104 96 96 104 104 96 96 104 104 96 96 96 96 104 104 10
4 96 96 104 104 104 104 104 104 96 104 104 96 96 104 104 104 104 104 104 104 96
96 104 96 96 104 104 104 104 96 96 104 96 96 104 104 104 104 104 104 96 104 104
104 96 104 96 96 96 96 96 104 104 96 96 104 104 104 104 104 104 96 104 96 104 96
104 96 96 96 96 104 96 96 104 104 96 96 96 96 104 104 104 96 96 104 96 104 104
104 96

Mode nothing    96 104 104 96 104 104 96 96 104 104 104 104 96 104 104 104 104 1
04 96 96 96 96 96 104 96 96 96 96 96 104 96 104 104 104 104 104 96 104 96 96 96
96 104 96 96 104 96 96 96 104 104 104 96 104 104 104 104 96 104 96 104 104 104 9
6 104 96 104 104 104 96 104 104 96 96 104 104 104 104 104 104 96 96 104 96 96 96
96 96 104 104 104 96 96 104 104 96 96 104 96 104 104 96 104 104 104 96 104 96 1
04 104 96 104 104 96 96 96 104 104 104 96 96 96 104 96 96 96 104 104 96 104 96 9
6 104 96 104 104 104 104 96 104 104 96 104 104 96 96 96 104 96 104 104 96 96 96
96 104 96 104 96 96 96 96 104 104 104 96 96 104 104 96 96 104 104 104 96 96 104
96 104 96 96 104 96 96 104 104 104 96 96 104 96 96 104 104 104 104 96 104 96 96
96


ok
Title: Re: SwitchToThread vs Sleep(0)
Post by: MichaelW on May 11, 2013, 11:47:01 AM
Measuring the length of a time slice is an interesting problem that I have attempted to solve several times before with no success. This is a crude proof of concept app that appears to mostly work. Running on my 3.0GHz P4 Windows XP test system I get ~31ms per time slice.

;==============================================================================
include \masm32\include\masm32rt.inc
.686
;==============================================================================
.data
    hThread1 dd 0
    hThread2 dd 0
    pBuff    dd 0
    count    dd 0
.code
;==============================================================================
ThreadProc1 proc uses ebx lpParameter:LPVOID
    .WHILE count < 10000000
        xor eax, eax
        cpuid
        rdtsc
        mov ecx, count
        mov ebx, pBuff
        mov DWORD PTR [ebx+ecx*8], 1
        mov DWORD PTR [ebx+ecx*8+4], eax
        inc count
    .ENDW
    ret
ThreadProc1 endp
;==============================================================================
ThreadProc2 proc uses ebx lpParameter:LPVOID
    .WHILE count < 10000000
        xor eax, eax
        cpuid
        rdtsc
        mov ecx, count
        mov ebx, pBuff
        mov DWORD PTR [ebx+ecx*8], 2
        mov DWORD PTR [ebx+ecx*8+4], eax
        inc count
    .ENDW
    ret
ThreadProc2 endp
;==============================================================================
start:
;==============================================================================

    invoke GetCurrentProcess
    invoke SetProcessAffinityMask, eax, 1

    ;--------------------------------------------------------------------------
    ; Minimize interruptions (THESE PRIORITY LEVELS NOT SAFE FOR SINGLE CORE).
    ;--------------------------------------------------------------------------

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, REALTIME_PRIORITY_CLASS
    invoke GetCurrentThread
    invoke SetThreadPriority, eax, THREAD_PRIORITY_TIME_CRITICAL

    ;-----------------------------------------------------------
    ; Buffer entries will be a dword thread identifier (1 or 2)
    ; followed by the low-order dword of the current TSC.
    ;-----------------------------------------------------------

    mov pBuff, alloc(10000000*8+1000)

    invoke CreateThread, NULL, 0, ThreadProc1, 0, 0, NULL
    mov hThread1, eax
    invoke CreateThread, NULL, 0, ThreadProc2, 0, 0, NULL
    mov hThread2, eax

    invoke WaitForMultipleObjects, 2, ADDR hThread1, TRUE, INFINITE

    ;--------------------------------------------------------------
    ; Display only buffer entries where thread identifier changed.
    ;--------------------------------------------------------------

    mov ebx, -1
    xor edi, edi
    mov esi, pBuff
  @@:
    add ebx, 1
    cmp ebx, 10000000
    ja  @F
    mov eax, [esi+ebx*8]
    cmp eax, edi
    je  @B
    mov edi, eax
    printf("%d\t", eax)
    mov eax, [esi+ebx*8+4]
    printf("%d\n", eax)
    jmp @B
  @@:
    inkey
    exit
;==============================================================================
end start


1       -663624222
2       -481849142
1       -388584002
2       -294945830
1       -201522110
2       -201592358
1       -14500670
2       -14566522
1       172600054
2       266095538
1       359615594
2       453104686
0       0
1       546554114
2       640150026
1       733605030
2       733534342
1       920610562
2       1014232250
1       1014073002
2       1201347382
1       1294654642
2       1388214506
1       1481674266
2       1575242006
1       1668778034
2       1762279714
1       1855800558
2       1949369834
1       1949199638
2       2042712442
1       2136223258
2       -1971577282
1       -1878131010
2       -1784582230
1       -1691100778
2       -1597555930
1       -1504092486
2       -1504153934
1       -1317026250
2       -1223522986
1       -1130035162
2       -1036435754
1       -943030074
2       -849501226
1       -823197342
Press any key to continue ...




Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 11, 2013, 03:35:58 PM
Thanks, Alex & Michael. Today I'm very busy but tonight I might have a chance to look at it again.
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 11, 2013, 05:59:36 PM
Hi Michael :t

Probably your source needs MASM11 to be build, but after small changement I got it in EXE with MASM10, here are the results:

1       -700983768
2       -573169360
1       -539986840
2       -473352576
1       -406676328
2       -340023304
1       -340056040
2       -206677800
1       -139850712
0       0
2       -73333560
1       -6666776
2       59998024
1       126676064
2       193347000
1       193303080
2       326677976
1       326643200
2       460011992
1       526689256
2       526651960
1       626656304
2       760389856
1       826697136
2       826666576
1       893336576
2       1030450656
1       1026676792
2       1093346880
1       1226727936
2       1293746208
1       1293355536
2       1429200640
1       1493412624
2       1560250120
0       0
1       1626755408
2       1694110448
1       1760088504
2       1768808272
1       1769072968
2       1769086424
1       1769346944
2       1769356696
1       1769616144
2       1769625320
1       1769884496
2       1769893560
1       1770152848
2       1770162008
1       1770420832
2       1770429848
1       1770714704
2       1770723896
1       1770984064
2       1770993120
1       1771253656
2       1771263032
1       1771522896
2       1771532576
1       1771792048
2       1771801320
1       1772062696
2       1772071896
1       1772332752
2       1772341960
1       1772623632
2       1772633104
1       1772893176
2       1772902352
1       1773161568
2       1773170752
1       1773430672
2       1773439872
1       1773699536
2       1773708752
1       1773969384
2       1773978816
1       1774238624
2       1774247744
1       1774507712
2       1774516520
1       1774805656
2       1774814728
1       1775074128
2       1775082792
1       1775342472
2       1775351632
1       1775611144
2       1775620240
1       1775880208
2       1775889496
1       1776149088
2       1776158224
1       1776418224
2       1776427360
1       1776687392
2       1776696632
1       1776978864
2       1776988096
1       1777248616
2       1777257872
1       1777517888
2       1777527016
1       1777787208
2       1777796592
1       1778056320
2       1778065816
1       1778326168
2       1778335288
1       1778594904
2       1778604368
1       1778887840
2       1778897072
1       1779158848
2       1779168280
1       1779427536
2       1779437336
1       1779697312
2       1779706232
1       1779966720
2       1779976312
1       1780236376
2       1780245616
1       1780505544
2       1780513864
1       1780774056
2       1780783368
1       1781068136
2       1781077360
1       1781338544
2       1781347928
1       1781607384
2       1781616640
1       1781876968
2       1781886216
1       1782147624
2       1782156432
1       1782418008
2       1782427904
1       1782691232
2       1782700728
1       1782960744
2       1782992888
1       1783253952
2       1783263256
1       1783525456
2       1783534816
1       1783795664
2       1783804504
1       1784065040
2       1784074416
1       1784336112
2       1784345560
1       1784606048
2       1784615512
1       1784875776
2       1784884288
1       1785168064
2       1785177560
1       1785439504
2       1785449016
1       1785710464
2       1785719928
1       1785980592
2       1785990344
1       1786250736
2       1786259864
1       1786522000
2       1786531616
1       1786793232
2       1786802744
1       1787064128
2       1787073728
1       1787360128
2       1787369352
1       1787630600
2       1787639928
1       1787901952
2       1787911512
1       1788173936
2       1788183864
1       1788445584
2       1788455328
1       1788717552
2       1788726904
1       1788989008
2       1788998768
1       1789284208
2       1789293680
1       1789555568
2       1789564968
1       1789827200
2       1789836016
1       1790098560
2       1790107944
1       1790370160
2       1790379648
1       1790641408
2       1790650824
1       1790912432
2       1790922048
1       1791184832
2       1791194848
1       1791495600
2       1791613528
1       1791878488
2       1791887728
1       1792149984
2       1792159480
1       1792421616
2       1792431408
1       1792693168
2       1792702568
1       1792964304
2       1792973640
1       1793236576
2       1793246064
0       0
1       1793415264
2       1826787000
1       1893585776
2       1960098160
1       1960054752
2       2093482120
1       -2134711480
2       -2068195632
1       -2068233672
2       -1934863720
1       -1934893520
2       -1868223504
1       -1734399224
2       -1668184408
1       -1601375912
2       -1534845376
1       -1456995456
2       -1401499368
1       -1334680240
2       -1334864416
1       -1268194320
2       -1201524256
1       -1068019200
2       -1001482176
1       -934348528
2       -934326064
1       -934063464
2       -934051816
1       -933791024
2       -933781752
1       -933519952
2       -933510504
1       -933249288
2       -933240192
1       -932979024
2       -932969960
1       -932675584
2       -932665720
1       -932405776
2       -932396648
1       -932136256
2       -932127520
1       -931867440
2       -931858064
1       -931597056
2       -931587576
1       -931324608
2       -931315200
1       -931053720
2       -931044296
1       -930782224
2       -930772952
1       -930484056
2       -930474704
1       -930213416
2       -930204400
1       -929944304
2       -929935080
1       -929674832
2       -929665400
1       -929405552
2       -929396456
1       -929135600
2       -929126384
1       -928863664
2       -928854504
1       -928571384
2       -928562200
1       -928301456
2       -928292024
1       -928031952
2       -928022648
1       -927762568
2       -927753272
1       -927492784
2       -927484000
1       -927222448
2       -927212920
1       -926951920
2       -926942560
1       -926680176
2       -926671360
1       -926386912
2       -926377568
1       -926116544
2       -926106464
1       -925844608
2       -925834032
1       -925573496
2       -925564080
1       -925303400
2       -925293648
1       -925032872
2       -925023352
1       -924762512
2       -924753304
1       -924490560
2       -924481384
1       -924198224
2       -924188584
1       -923927232
2       -923917480
1       -923655464
2       -923646248
1       -923386192
2       -923376984
1       -923115768
2       -923107080
1       -922846016
2       -922836616
1       -922577104
2       -922567680
1       -922285088
2       -922275584
1       -922012616
2       -922003288
1       -921742368
2       -921732944
1       -921473216
2       -921463768
1       -921203120
2       -921193328
1       -920932104
2       -920922560
1       -920660688
2       -920651352
1       -920389616
2       -920380672
1       -920095568
2       -920086272
1       -919825656
2       -919816488
1       -919556232
2       -919546080
1       -919285056
2       -919275816
1       -919014888
2       -919005520
1       -918744568
2       -918735232
1       -918474032
2       -918465504
1       -918204920
2       -918195136
1       -917912576
2       -917903152
1       -917641992
2       -917632656
1       -917371064
2       -917361696
1       -917100528
2       -917091192
1       -916830120
2       -916821392
1       -916560560
2       -916550824
1       -916290088
2       -916280432
1       -915996704
2       -915987120
1       -915725064
2       -915715592
1       -915453744
2       -915444424
1       -915182088
2       -915171872
1       -914909696
2       -914900280
1       -914638168
2       -914628240
1       -914366944
2       -914358232
1       -914096424
2       -914086944
1       -913800232
2       -913790816
1       -913527608
2       -913517888
1       -913256608
2       -913247200
1       -912986144
2       -912976368
1       -912714352
2       -912705024
1       -912443832
2       -912434432
1       -912173416
2       -912163800
1       -911867312
2       -911745832
1       -911477496
2       -911467120
1       -911205928
2       -911196456
1       -910934888
2       -910925016
1       -910663368
2       -910653504
1       -910391024
2       -910381888
1       -910120272
2       -910110016
1       -909848296
2       -909812544
1       -909549664
2       -909540464
1       -909278680
2       -909269272
1       -909007464
2       -908998064
1       -908736128
2       -908726312
1       -908464384
2       -908454424
1       -908192416
2       -908182896
1       -907920392
2       -907910848
1       -907622328
2       -907612848
1       -907350656
2       -907341264
1       -907079160
2       -907069744
1       -906808408
2       -906799168
1       -906535504
2       -906524224
1       -906261744
2       -906252240
1       -905990432
2       -905981088
1       -905719680
2       -905710384
1       -905426152
2       -905416576
1       -905155472
2       -905145088
1       -904882648
2       -904873248
1       -904611160
2       -904601848
1       -904341104
2       -904331592
1       -904069896
2       -904060352
1       -903799248
2       -903789696
1       -903507456
2       -903497936
1       -903235504
2       -903226296
1       -902965456
2       -902956200
1       -902693736
2       -902684280
1       -902422480
2       -902412968
1       -902151032
2       -902141416
1       -901878312
2       -901868816
1       -901607552
2       -901598168
0       0
1       -901473896
2       -901509600
1       -801304968
2       -734795960
1       -734835216
2       -601449424
1       -601495024
2       -534824896
1       -401326624
2       -334774872
1       -334816224
2       -201424456
1       -201476112
2       -68073544
1       -68135896
2       65244048
1       132353448
2       198586448
1       265677736
2       331926920
1       332045432
2       332309648
1       332326280
2       332588328
1       332604008
2       332866648
1       332882376
2       333147056
1       333162648
2       333427424
1       333443128
2       333707056
1       333722496
2       334008832
1       334024960
2       334287576
1       334303416
2       334566448
1       334581872
2       334844952
1       334860456
2       335123032
1       335138752
2       335400928
1       335416408
2       335680848
1       335696584
2       335959488
1       335976944
2       336270568
1       336287408
2       336549736
1       336567464
2       336830224
1       336846464
2       337109232
1       337124944
2       337387056
1       337402608
2       337665080
1       337681320
2       337944488
1       337960032
2       338244440
1       338261952
2       338524816
1       338541576
2       338805080
1       338821536
2       339085144
1       339100760
2       339364840
1       339380480
2       339642384
1       339657784
2       339920328
1       339937928
2       340200264
1       340241520
2       340503912
1       340520192
2       340783688
1       340800248
2       341062984
1       341078648
2       341342264
1       341357640
2       341619264
1       341637824
2       341900656
1       341916832
2       342179888
1       342195560
2       342483440
1       342500888
2       342764344
1       342781464
2       343044696
1       343062184
2       343325112
1       343341800
2       343604144
1       343620040
2       343882912
1       343899496
2       344162656
1       344178720
2       344464888
1       344481520
2       344743832
1       344759272
2       345023288
1       345040168
2       345303192
1       345320560
2       345583536
1       345599016
2       345861416
1       345878976
2       346141304
1       346156848
2       346420616
1       346436016
2       346720296
1       346735888
2       346997144
1       347012440
2       347272992
1       347288360
2       347549712
1       347565216
2       347827584
1       347844376
2       348107688
1       348124824
2       348387344
1       348403528
2       348694080
1       348711336
2       348974016
1       348990776
2       349253168
1       349269216
2       349532072
1       349548152
2       349810560
1       349826160
2       350089328
1       350105568
2       350368624
1       350383816
2       350669568
1       350685224
2       350947416
1       350962952
2       351225784
1       351241184
2       351504176
1       351521288
2       351783304
1       351799864
2       352062304
1       352079584
2       352342216
1       352358584
2       352621344
1       352637264
2       352921528
1       352938544
2       353202184
1       353217584
2       353479680
1       353495312
2       353758192
1       353774064
2       354038160
1       354054080
2       354317424
1       354333472
2       354597008
1       354613144
2       354913464
1       355017440
2       355281016
1       355296280
2       355558808
1       355575504
2       355837696
1       355855136
2       356118072
1       356134152
2       356395448
1       356412744
2       356675536
1       356691344
2       356976968
1       356992904
2       357255832
1       357271208
2       357534360
1       357550072
2       357813576
1       357829464
2       358094152
1       358109760
2       358372656
1       358388512
2       358650904
1       358666584
2       358929688
1       358946008
2       359231936
1       359248808
2       359512984
1       359529584
2       359793144
1       359808552
2       360071016
1       360086976
2       360349672
1       360365376
2       360628384
1       360644256
2       360906728
1       360922216
2       361212632
1       361228464
2       361492032
1       361507640
2       361771392
1       361787464
2       362049424
1       362065856
2       362329344
1       362345016
2       362607728
1       362625744
2       362888168
1       362905200
2       363190040
1       363206000
2       363469504
1       363485064
2       363748432
1       363764088
2       364026984
1       364042296
2       364304344
1       364319896
2       364583352
1       364599080
2       364861240
1       364877304
2       365139176
1       365154936
0       0
2       365245808
1       371808112
Press any key to continue ...
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 11, 2013, 08:29:49 PM
prescott w/htt, xp mce2005 sp3
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Mode Sleep 0    3983 3322 3083 2813 2880 2797 2865 2895 2805 2820 2827 2828 2872
2865 2910 2842 2865 2895 2835 2858 2812 2805 19463 2895 2813 2805 2887 2828 280
5 2880 2873 2865 2888 2843 2910 3165 3000 2797 2857 2798 2820 2857 2828 2850 288
0 2828 2812 2918 2843 2880 2902 2842 2842 2895 2828 2835 2850 2790 2857 2813 281
3 3037 2835 2805 2895 2835 2828 2880 2865 2813 2850 2880 2865 3405 4778 3772 317
3 3037 2820 2812 2873 2827 2835 2933 2857 2910 2910 2902 2880 2880 2835 2880 291
0 2873 2865 2873 2805 2842 2910 2843 2820 2910 2782 3502 3337 2827 2895 2887 280
5 2873 2880 2827 2813 2835 2850 2850 2865 2820 2865 2910 2850 2887 2813 2918 288
8 2865 2827 2880 3082 2828 2835 2895 2835 2888 2812 2805 2850 2888 2827 2842 287
3 5295 3720 4395 3563 3045 3015 3008 2888 2910 2857 2797 2842 2933 2865 2850 289
5 2835 2805 2827 2903 2842 2805 2865 2805 2887 2865 2820 2827 2902 2805 2850 285
0 2865 2843 2805 2783 2888 2888 2820 2932 2865 2925 3082 2857 2812 2858 2835 284
3 2835 2850 2827 2835 2865 2865 2775 2880 2827 2857 3187 3188

Mode SwitchToThread     855 862 855 863 855 855 862 862 862 862 855 863 863 862
877 863 855 855 855 863 870 855 863 863 863 863 855 862 863 855 862 863 863 862
862 855 862 855 855 862 855 862 855 862 862 863 863 862 862 863 863 863 855 863
855 863 862 855 863 855 863 862 855 863 855 863 863 863 862 877 862 855 863 863
877 862 855 870 862 863 862 863 855 863 855 862 855 863 862 862 855 862 862 862
855 863 863 862 863 863 863 855 878 855 855 863 862 863 862 855 862 863 863 862
863 855 863 855 863 863 862 855 863 863 862 862 863 862 855 863 862 863 863 855
855 855 855 863 855 863 855 855 862 863 863 863 862 863 855 863 863 863 855 863
862 862 855 863 862 862 855 855 855 855 862 862 863 855 863 855 855 863 863 855
855 862 862 863 862 862 863 862 863 862 855 862 855 863 855 863 855 863 1147 132
0 862 862 863 863 863 855 863

Mode loop       105 105 105 105 105 105 105 105 105 105 98 105 98 105 105 105 10
5 105 105 105 105 105 105 105 105 105 105 105 105 105 105 105 105 105 105 105 10
5 105 105 105 105 98 105 105 105 105 97 105 105 105 105 105 105 105 105 105 105
105 105 105 105 105 105 105 105 105 105 105 97 105 105 105 105 105 105 105 105 1
05 105 97 105 105 105 105 105 105 105 105 105 105 105 105 105 105 105 97 105 105
105 97 105 105 97 98 105 105 105 105 105 105 105 105 105 105 105 105 105 97 105
97 105 105 105 97 105 105 105 105 105 105 105 105 105 105 105 105 105 105 105 1
05 105 98 105 105 113 105 97 97 105 98 105 105 105 105 105 105 105 105 105 97 10
5 105 105 105 105 98 105 98 105 105 105 105 105 98 105 105 105 105 105 105 97 10
5 105 105 105 105 97 105 98 105 105 105 105 98 105 105 105 105 105 105 105

Mode nothing    105 105 105 105 105 105 105 105 105 97 105 105 98 105 105 105 10
5 105 98 98 105 105 105 105 105 105 105 105 105 105 105 105 105 105 105 105 105
105 105 105 105 105 105 105 105 105 105 105 105 97 105 105 105 105 105 105 105 1
05 105 105 105 105 105 105 105 98 105 105 105 105 105 105 105 98 105 97 105 98 1
05 105 105 105 105 105 105 105 105 105 105 105 105 98 97 105 105 98 105 105 105
105 105 98 105 105 105 105 105 105 105 105 105 105 105 105 105 97 97 105 105 105
105 105 105 105 105 105 105 105 105 105 105 98 105 105 105 105 97 105 105 105 9
8 105 105 105 105 105 105 105 105 97 105 105 105 98 105 105 105 105 105 105 105
105 97 105 105 97 105 105 98 105 105 105 105 105 105 105 105 105 105 105 105 97
105 105 105 105 105 97 105 105 105 105 105 105 105 105 105 105 105 97 105
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 11, 2013, 08:32:55 PM
here is a little program i tried....
;###############################################################################################

        .XCREF
        .NoList
        INCLUDE    \Masm32\Include\Masm32rt.inc
        .686p
        .MMX
        .XMM
        .List

;###############################################################################################

        .CODE

;***********************************************************************************************

_main   PROC

        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1
        INVOKE  Sleep,750
        INVOKE  Sleep,0

        mov     esi,10

Loop00: rdtsc
        push    edx
        push    eax
        INVOKE  Sleep,0
        dec     esi
        jnz     Loop00

        pop     ebx
        pop     edi
        mov     esi,9

Loop01: xchg    eax,ebx
        mov     edx,edi
        pop     ebx
        pop     edi
        sub     eax,ebx
        sbb     edx,edi
        print   ustr$(eax),13,10
        dec     esi
        jnz     Loop01

        print   chr$(13,10)
        inkey
        INVOKE  ExitProcess,0

_main   ENDP

;###############################################################################################

        END     _main

and the results on my machine...
2137
2138
2107
2055
2100
2078
2085
2167
2378


according to Reed Copsey's comment on this page...
http://stackoverflow.com/questions/1383943/switchtothread-vs-sleep1 (http://stackoverflow.com/questions/1383943/switchtothread-vs-sleep1)
QuoteIn general, Sleep(0) will be much more likely to yield a timeslice, and
will ALWAYS yield to the OS, even if there are no other threads waiting.
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 11, 2013, 08:45:36 PM
Quote from: dedndave on May 11, 2013, 08:32:55 PM
here is a little program i tried...
Results on my machine:
912
912
912
912
888
912
900
912
1152

But what does it measure?
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 11, 2013, 08:57:01 PM
i am guessing that's the length of a time-slice, in clock cycles   :P

we could probably bump the process priority level for better results
Title: Re: SwitchToThread vs Sleep(0)
Post by: hutch-- on May 11, 2013, 09:46:06 PM
Its been a while but from regularly using SleepEx() you need to set it to 1 or greater, not zero as it immediately returns if nothing else is pending. Back in the PIV days any call to SleepEx() under about 20 yielded about 20 ms anyway so as a rough guess on a PIV the timeslice was about 20 ms.
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 12, 2013, 02:34:36 PM
Quote from: dedndave on May 11, 2013, 08:32:55 PM
according to Reed Copsey's comment on this page...
http://stackoverflow.com/questions/1383943/switchtothread-vs-sleep1 (http://stackoverflow.com/questions/1383943/switchtothread-vs-sleep1)
QuoteIn general, Sleep(0) will be much more likely to yield a timeslice, and
will ALWAYS yield to the OS, even if there are no other threads waiting.

That's with no doubts, Dave, the Sleep(0) is, for instance, the only way for the code to switch the core, and this behaviour - timeslice dropping - is a "system promise". The only problem is that if there are no other thread waiting angry for a CPU time, i.e., if other threads are in the idle state (Get(Peek)Message/WaitForSignle(Multiple)Objects/Sleep(Ex) etc), then the time that was spent in Sleep(0), crude speaking, is the time that needed to switch to the kernel mode, scheduler checks for the waiting threads, if there are no such threads, then it just reschedules our thread again. This takes not too much cycles, and probably this timing shows your example :t

NtDelayExecution (ntdll) is the underlying thing of Sleep(Ex). It takes two parameters: first one is the the same as second param if SleepEx (bAlertable), and for Sleep it is 0, second one is the pointer to a QWORD which holds the delay in a negative 100-nanoseconds format (this not meant that for user mode code it will make delays with resulution of 100 nanoseconds). I.e., for a delay of 123 milliseconds it need to be called as:


push 0FFFFFFFFh \
push 0FFED3B50h /  (-1230000)
push esp
push 0
call NtDelayExecution
pop edx
pop edx


BTW, your source may be cahanged a bit :biggrin:

;###############################################################################################

        .XCREF
        .NoList
        INCLUDE    \Masm32\Include\Masm32rt.inc
        include \masm32\include\ntdll.inc
        includelib \masm32\lib\ntdll.lib
        .686p
        .MMX
        .XMM
        .List

;###############################################################################################

.data?
zeroqword   dq  ?

        .CODE

;***********************************************************************************************

_main   PROC

        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1
        INVOKE  Sleep,750
        INVOKE  Sleep,0

        mov     esi,10

Loop00: rdtsc
        push    edx
        push    eax
        INVOKE  NtDelayExecution,0,offset zeroqword
        dec     esi
        jnz     Loop00

        pop     ebx
        pop     edi
        mov     esi,9

Loop01: xchg    eax,ebx
        mov     edx,edi
        pop     ebx
        pop     edi
        sub     eax,ebx
        sbb     edx,edi
        print   ustr$(eax),13,10
        dec     esi
        jnz     Loop01

        print   chr$(13,10)
        inkey
        INVOKE  ExitProcess,0

_main   ENDP

;###############################################################################################

        END     _main
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 12, 2013, 03:19:25 PM
Quote from: Antariy on May 12, 2013, 02:34:36 PMSleep(0) is, for instance, the only way for the code to switch the core

A behaviour which we exclude with SetProcessAffinityMask. When limited to one core, there should be no difference between SwitchToThread and Sleep(0) except that one of them could be a bit slower...
Title: Re: SwitchToThread vs Sleep(0)
Post by: sinsi on May 12, 2013, 03:33:46 PM
What is the return value from SwitchToThread?
QuoteIf there are no other threads ready to execute, the operating system does not switch execution to another thread, and the return value is zero.
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 12, 2013, 03:47:17 PM
Quote from: sinsi on May 12, 2013, 03:33:46 PM
What is the return value from SwitchToThread?

Mixed, about 50:50, except for "mode loop" and "nothing" where 1 is more frequent. And no influence on the timings, i.e. it seems you get a fresh timeslice anyway.

Thanks for the hint, John :t
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 12, 2013, 08:49:34 PM
Quote from: jj2007 on May 12, 2013, 03:19:25 PM
Quote from: Antariy on May 12, 2013, 02:34:36 PMSleep(0) is, for instance,

A behaviour which we exclude with SetProcessAffinityMask. When limited to one core, there should be no difference between SwitchToThread and Sleep(0) except that one of them could be a bit slower...

After SetProcess/ThreadAffinityMask the executed thread is not switched to specified core right now, and "the only way for the code to switch the core" is Sleep(0). At least according to a MS.
This was a full sense of that quote.
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 13, 2013, 06:02:03 PM
Still playing...
The assumption is that timings should be measured inside one timeslice. To detect if a timeslice has been left, it is further assumed that the cycle count for the test loop would jump over a certain value, say: 10000 cycles. Using the attached TestMe macro, it turns out that the slice is typically about 15ms or 35Mio cycles/2.3GHz (LiTs=Loops inside timeslice, CiTs=Cycles inside timeslice):

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
cycles  LiTs    CiTs
Loopct 0/0
23      6250034 143810186
23      1515100 34848007
23      4656881 107434202
23      1518678 34930235
23      3099625 71535622
23      1515225 34854130
23      1519461 34948261
23      1535141 35320238
23      6285264 145841188

Loopct 10/0
100     718797  71884004
100     348853  34885990
100     23890   2390032
100     323494  32349986
100     349989  34999560
100     352675  35268098
100     350274  35028107
100     351695  35200207
100     34103   3417695

Loopct 100/0
645     55709   35934108
645     54053   34865016
645     54667   35260579
645     54418   35100028
645     54795   35351193
653     1532    1000792
646     9954    6430330
651     378     246137
647     778     503492

Loopct 1000/0
8948    225     2013349
9030    3611    32609769
9030    3899    35211161
8842    225     1989386
9028    3497    31572751
9030    3886    35092325
8988    196     1761638
9029    3568    32218893
9030    3910    35310563

Loopct 1000/1
5836    6146    35872323
5836    180     1050405
5836    5650    32974841
5836    6060    35367503
5836    5997    35000922
5836    6053    35327492
5943    164     974610
5836    5463    31884485
5837    6078    35479713


One odd observation regarding the test loop:
                push loopct
                .Repeat        ; fake activity here
                        fild dword ptr [esp]
                        if usefldpi
                                fldpi        ; include these two...
                                fmul        ; ... and the code gets faster (!), see "1000/1" above.
                        endif
                        movd xmm0, dword ptr [esp]        ; ML 6.14 encodes this as movd mm0, ...
                        fstp st
                        dec dword ptr [esp]
                .Until Sign?
                pop eax
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 13, 2013, 09:50:28 PM
prescott w/htt, xp mce2005 sp3
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
cycles  LiTs    CiTs
Loopct 0/0
200     28947   5808113
200     12169   2436278
201     12265   2473065
200     12687   2539852
201     85225   17205435
202     12547   2537310
201     41483   8374245
202     12561   2542358
200     12623   2527148


Loopct 10/0
232     1       105
225     1       98
232     1       105
232     1       105
232     1       105
232     1       105
232     1       105
232     1       105
232     1       105


Loopct 100/0
1399    2052    2872170
1667    1432    2388367
1631    1565    2552565
1417    1568    2222145
1403    1809    2538772
1417    1668    2364413
1667    1533    2556015
1499    1703    2553690
1641    1552    2546775


Loopct 1000/0
232     1       105
5092    2       10057
5043    2       9960
5043    2       9960
5043    2       9960
5024    2       9922
5025    2       9923
5021    2       9915
5024    2       9922


Loopct 1000/1
240     1       113
7209    330     2378948
7210    353     2545200
7210    354     2552317
7208    354     2551800
7211    351     2530950
7208    354     2551830
7209    354     2552212
7208    351     2530140


probably not what you expected to see   :P
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 13, 2013, 11:25:41 PM
Quote from: dedndave on May 13, 2013, 09:50:28 PM
prescott w/htt, xp mce2005 sp3
..
probably not what you expected to see   :P

The P4 apparently needs higher MaxCycles values to stabilise.
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 14, 2013, 12:21:45 AM
well - not sure it's just the P4
the HAL is probably a little different for each OS/CPU combination

one thing about my P4 is that it uses hyper-threading for the second logical core
that surely changes how time slices are alotted

i don't think context switches between logical cores is handled the same as physical cores
i.e. - the OS can't change switching between hyper-threaded contexts - it's "hard-wired"

and - the fact that i am using mce2005 probably makes my machine unique against other members - lol
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 14, 2013, 12:48:48 AM
i found this in the registry
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex]
"MaxQueryTimeslice"=dword:00000032

i think that's a timeslice quantity, though (i.e. 50 time slices)

a possibility that we may have overlooked is that the length of a time-slice could be dynamic
so - the length changes with conditions   :P
that would certainly explain why there is so little official documentation on the subject
Title: Re: SwitchToThread vs Sleep(0)
Post by: hutch-- on May 14, 2013, 01:12:40 AM
This is a bit crude but its an attempt to get some idea of the duration of task switching intervals using SleepEx().



IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL du    :DWORD
    LOCAL tc    :DWORD
    LOCAL cn    :DWORD

    push esi

    mov du, 0                       ; start duration at 0

    fn QueryPerformanceCounter,ADDR cn
    mov eax, cn
    mov esi, eax

  @@:
    fn SleepEx,du,0
    add du, 1
    fn QueryPerformanceCounter,ADDR cn
    mov eax, cn
    mov tc, eax

    sub esi, tc
    neg esi

    print ustr$(du)," = "
    print ustr$(esi),13,10

    mov esi, tc

    cmp du, 64
    jbe @B

    pop esi

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: SwitchToThread vs Sleep(0)
Post by: Tedd on May 14, 2013, 02:19:00 AM
There's so many variables that you won't get any consistent results.

Time slices last up to a selected number of milliseconds (they may be shorter due to various interruptions, or yielding while waiting for I/O operations.)
The number of milliseconds any thread gets depends on: its current priority, whether the program window currently 'has focus', what other processes are running and their priorities, whether scheduling is set to favour programs or services (user modifiable), ...
And then the number of clock cycles you actually get within that time depends on a number of things, namely CPU clock frequency and throttling.



More on how 'simple' it is: http://support.microsoft.com/kb/259025 (http://support.microsoft.com/kb/259025)
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 14, 2013, 03:46:24 AM
Quote from: Tedd on May 14, 2013, 02:19:00 AM
More on how 'simple' it is: http://support.microsoft.com/kb/259025 (http://support.microsoft.com/kb/259025)

Good link, Tedd, thanks. For our main purpose, timing, much of this complexity is irrelevant: We have a high priority foreground process with no I/O.

The code I posted above shows that some cycle counts occur more often. For my Celeron, it's 1.445 Mio cycles - that is 0.9 milliseconds. For timing purposes, context switches should be avoided because they distort the results. So the goal is to get the highest possible cycle count within one time slice. Strange that Windows does not have a GetLenOfQuantum API...
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 14, 2013, 04:15:42 AM
i would think you could derive such a value from the registry, if you knew how to manipulate which values   :P
learn all you can about the HAL
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 14, 2013, 05:46:08 AM
Quote from: dedndave on May 14, 2013, 04:15:42 AM
i would think you could derive such a value from the registry

Win32PrioritySeparation ... not very useful  :(
Title: Re: SwitchToThread vs Sleep(0)
Post by: qWord on May 14, 2013, 07:35:56 AM
There is a description in the book "Windows Internals", which shows how determine the clock cycles per quantum - It can be derived from the current timer resolution:
include \masm32\include\masm32rt.inc
include \masm32\include\advapi32.inc
includelib \masm32\lib\advapi32.lib

NTQUERYTIMERRESOLUTION typedef proto MinimumResolution:PULONG,MaximumResolution:PULONG,ActualResolution:PULONG
PNTQUERYTIMERRESOLUTION typedef ptr NTQUERYTIMERRESOLUTION

UULONG union
    qw QWORD ?
    ul ULONG ?
UULONG ends

.const
    c0 REAL8 1.0E6
    c1 REAL8 100.0E-9
    c3 REAL8 0.33333333333333333
.data?
    NtQueryTimerResolution PNTQUERYTIMERRESOLUTION ?
.code
main proc uses esi
LOCAL hKey:HKEY
LOCAL ulMHz:UULONG,_size:DWORD
LOCAL ulMinimumResolution:UULONG,ulMaximumResolution:UULONG,ulActualResolution:UULONG
LOCAL MinimumResolution:REAL8,MaximumResolution:REAL8,ActualResolution:REAL8,MHz:REAL8
LOCAL r8:REAL8
    .repeat
    ; get undocumented function NtQueryTimerResolution :-|
        .if !rvx(NtQueryTimerResolution = GetProcAddress,rv(GetModuleHandle,"Ntdll.dll"),"NtQueryTimerResolution")
            print "error: can't locate NtQueryTimerResolution",13,10
            .break
        .endif
       
        mov _size,4
        .if rvx(esi = RegOpenKeyEx,HKEY_LOCAL_MACHINE,"HARDWARE\DESCRIPTION\System\CentralProcessor\0",0,KEY_READ,&hKey) != ERROR_SUCCESS || \
            rvx(RegQueryValueEx,hKey,"~MHz",0,0,&ulMHz, &_size) != ERROR_SUCCESS
            print "error: can't read CPU frequency from registry",13,10
            .break         
        .endif
       
        mov ulMHz.ul[4],0
        fild ulMHz.qw
        fmul c0
        fstp MHz
       
        fnc crt_printf,"CPU frequency: %u [MHz]\n",ulMHz.ul
       
        fn NtQueryTimerResolution,&ulMinimumResolution,&ulMaximumResolution,&ulActualResolution
       
        FOR var,<MinimumResolution,MaximumResolution,ActualResolution>
            mov ul&var&.ul[4],0
            fild ul&var&.qw
            fmul c1
            fst var     ; -> seconds
           
            fmul MHz
            fmul c3     ; var*(1/3) -> Windows Internal: "each quantum unit is one-third of a clock interval"
            fstp r8     ; -> clocks per quantum
            fnc crt_printf,"Timer: &var : %.6G [s] -\r clocks/quantum: %G\n",var,r8
        ENDM

    .until 1
   
    inkey
    exit
main endp
end main

They also mention that there is a variable that can be read out in kernel mode: KiCyclesPerClockQuantum.

for my i7QM:
CPU frequency: 2294 [MHz]
Timer: MinimumResolution : 0.0156001 [s] -> clocks/quantum: 1.19289E+007
Timer: MaximumResolution : 0.0005 [s] -> clocks/quantum: 382333
Timer: ActualResolution : 0.001 [s] -> clocks/quantum: 764667
Press any key to continue ...

AFAICS this result only applies when the CPU is not throttled.
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 14, 2013, 09:22:29 AM
prescott w/htt, xp mce2005 sp3
CPU frequency: 3000 [MHz]
Timer: MinimumResolution : 0.015625 [s] -> clocks/quantum: 1.5625E+007
Timer: MaximumResolution : 0.001 [s] -> clocks/quantum: 1E+006
Timer: ActualResolution : 0.0009766 [s] -> clocks/quantum: 976600
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 14, 2013, 01:55:00 PM
I have some doubts that the timer resolution has any relation to the scheduler work - there are too much things which have influence to the timeslice (just like Tedd said). I suspect, these values we get from an native api - timer resolution - are for the multimedia timers, not for timeslice.

There is also the function:

NTSYSAPI
NTSTATUS
NTAPI
NtSetTimerResolution(

  IN ULONG                DesiredResolution,
  IN BOOLEAN              SetResolution,
  OUT PULONG              CurrentResolution );


But obviously that doesn't relate to the timeslice as well.
BTW, we can probably try to change timer resolution via MM APIs and then query it with native API - just to check.

Anyway, results for the code:

CPU frequency: 2133 [MHz]
Timer: MinimumResolution : 0.015625 [s] -> clocks/quantum: 1.11094E+007
Timer: MaximumResolution : 0.001 [s] -> clocks/quantum: 711000
Timer: ActualResolution : 0.0009766 [s] -> clocks/quantum: 694363
Press any key to continue ...


BTW Minimal resolution here (15,625 milliseconds) is the default resolution of Sleep(Ex), GetTickCount, timeGetTime under NTs.
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 14, 2013, 03:00:13 PM
After a short playing with these functions (as well as MM APIs...) they all did not seem to change anything on my system at all :greensml:


Min res: 156250, Max res: 10000, Actual res: 9766

Sequental calls to GetTickCount/Sleep,20/GetTickCount difference: 16

NtSTR NTSTATUS: 00000000, actual: 9766

Min res: 156250, Max res: 10000, Actual res: 9766

Sequental calls to GetTickCount/Sleep,20/GetTickCount difference: 15

Press any key to continue ...



include \masm32\include\masm32rt.inc
include \masm32\include\ntdll.inc
includelib \masm32\lib\ntdll.lib
include \masm32\include\winmm.inc
includelib \masm32\lib\winmm.lib

;NTQUERYTIMERRESOLUTION typedef proto MinimumResolution:PULONG,MaximumResolution:PULONG,ActualResolution:PULONG
;PNTQUERYTIMERRESOLUTION typedef ptr NTQUERYTIMERRESOLUTION

UULONG union
    qw QWORD ?
    ul ULONG ?
UULONG ends

.const
    c0 REAL8 1.0E6
    c1 REAL8 100.0E-9
    c3 REAL8 0.33333333333333333
.data?
;    NtQueryTimerResolution PNTQUERYTIMERRESOLUTION ?
.code
main proc uses esi
LOCAL hKey:HKEY
LOCAL ulMHz:UULONG,_size:DWORD
LOCAL ulMinimumResolution:ULONG,ulMaximumResolution:ULONG,ulActualResolution:ULONG
LOCAL MinimumResolution:REAL8,MaximumResolution:REAL8,ActualResolution:REAL8,MHz:REAL8
LOCAL r8:REAL8


;invoke timeBeginPeriod,20
;invoke Sleep,0



    .repeat

        fn NtQueryTimerResolution,addr ulMinimumResolution,addr ulMaximumResolution,addr ulActualResolution

invoke crt_printf,CTXT("Min res: %lu, Max res: %lu, Actual res: %lu",10,10),ulMinimumResolution,ulMaximumResolution,ulActualResolution

invoke GetTickCount
mov ebx,eax
invoke Sleep,20
invoke GetTickCount
sub eax,ebx
invoke crt_printf,CTXT("Sequental calls to GetTickCount/Sleep,20/GetTickCount difference: %d",10,10),eax

     mov ulMinimumResolution,50000
     invoke NtSetTimerResolution,addr ulMinimumResolution,1,addr ulActualResolution
     invoke crt_printf,CTXT("NtSTR NTSTATUS: %p, actual: %lu",10,10),eax,ulActualResolution
     invoke Sleep,0

        fn NtQueryTimerResolution,addr ulMinimumResolution,addr ulMaximumResolution,addr ulActualResolution

invoke crt_printf,CTXT("Min res: %lu, Max res: %lu, Actual res: %lu",10,10),ulMinimumResolution,ulMaximumResolution,ulActualResolution

invoke GetTickCount
mov ebx,eax
invoke Sleep,20
invoke GetTickCount
sub eax,ebx
invoke crt_printf,CTXT("Sequental calls to GetTickCount/Sleep,20/GetTickCount difference: %d",10,10),eax

    .until 1

;invoke timeEndPeriod,1

    inkey
    exit
main endp
end main





Obviously the Michael's experimental code is the way to check the timeslice length in real time.

Having changed it a bit, I get, for instance, these results:

Commiting the memory loops... 3 2 1
Starting threads...
Flagging threads to start...
2       -2146197512
1       -2128130376
2       -2061490392
1       -1994868136
2       -1928309152
1       -1861226840
2       -1794693936
1       -1728372904
2       -1661278560
1       -1594765592
2       -1528019176
1       -1461520648
2       -1461783008
1       -1395112944
2       -1328442808
1       -1261772736
2       -1125095944
1       -1061480520
2       -1061762488
1       -927819416
2       -928423880
1       -795053216
2       -727995480
1       -728413624
2       -661743592
1       -595073432
2       -528403400
1       -395034176
2       -328250360
1       -261685496
2       -194617560
1       -128355360
2       -53439128
-1      -1
1       4988000
2       72073936   \
1       138326792  / timeslice ~31 ms
2       138294296
1       204964432
2       338715304
1       405007400
2       404974600
1       538372152
2       605409656  \ timeslice 199 microseconds
1       604984904  /\
2       738475176   / timeslice ~62 ms
1       738323552
2       872103120
1       938366456
2       938333784
1       1071705408
2       1143388808
1       1205660120
2       1205014048
1       1338386880
2       1405433584
1       1471723520
2       1538516608
1       1605088400
2       1672120992
1       1671703120
2       1738373224
1       1871743168
2       1871711800
1       2005082920
2       2079923832
1       2138425592
2       2138392064
1       -2023205624
2       -1956412152
1       -1889864432
2       -1822804288
1       -1756522312
2       -1756556240
1       -1689886128
2       -1556117160
1       -1489845864
2       -1423056504
1       -1423205784
2       -1289432992
1       -1223167360
2       -1223197168
1       -1145165320
Press any key to continue ...


And much results are fluctuate - obviously the timeslice is not "hardcoded".

Slightly modified code:

;==============================================================================
include \masm32\include\masm32rt.inc
.686
;==============================================================================
.data
    hThread1 dd 0
    hThread2 dd 0
    pBuff    dd 0
    count    dd 0
    lok      dd 1
.code
;==============================================================================
ThreadProc1 proc uses ebx lpParameter:LPVOID

.while lok
.endw

    .WHILE count < 10000000
        xor eax, eax
        cpuid
        rdtsc
        mov ecx, count
        mov ebx, pBuff
        mov DWORD PTR [ebx+ecx*8], 1
        mov DWORD PTR [ebx+ecx*8+4], eax
        inc count
    .ENDW
    ret
ThreadProc1 endp
;==============================================================================
ThreadProc2 proc uses ebx lpParameter:LPVOID

.while lok
.endw

    .WHILE count < 10000000
        xor eax, eax
        cpuid
        rdtsc
        mov ecx, count
        mov ebx, pBuff
        mov DWORD PTR [ebx+ecx*8], 2
        mov DWORD PTR [ebx+ecx*8+4], eax
        inc count
    .ENDW
    ret
ThreadProc2 endp
;==============================================================================
start:
;==============================================================================

    invoke GetCurrentProcess
    invoke SetProcessAffinityMask, eax, 1

    ;--------------------------------------------------------------------------
    ; Minimize interruptions (THESE PRIORITY LEVELS NOT SAFE FOR SINGLE CORE).
    ;--------------------------------------------------------------------------

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, REALTIME_PRIORITY_CLASS
    invoke GetCurrentThread
    invoke SetThreadPriority, eax, THREAD_PRIORITY_TIME_CRITICAL

    ;-----------------------------------------------------------
    ; Buffer entries will be a dword thread identifier (1 or 2)
    ; followed by the low-order dword of the current TSC.
    ;-----------------------------------------------------------

    mov ebx,10000000*8+1000

    mov pBuff, alloc(ebx)

    invoke crt_printf,CTXT("Commiting the memory loops...")

    mov esi,3
    @@:
    invoke crt_printf,CTXT(" %lu"),esi
    mov edi,pBuff
    mov ecx,ebx
    shr ecx,2
    or eax,-1
    rep stosd
    dec esi
    jnz @B

    invoke crt_printf,CTXT(10,"Starting threads...",10)
   

    invoke CreateThread, NULL, 0, ThreadProc1, 0, 0, NULL
    mov hThread1, eax
    invoke CreateThread, NULL, 0, ThreadProc2, 0, 0, NULL
    mov hThread2, eax

    invoke crt_printf,CTXT("Flagging threads to start...",10)
   
    and lok,0

    invoke WaitForMultipleObjects, 2, ADDR hThread1, TRUE, INFINITE

    ;--------------------------------------------------------------
    ; Display only buffer entries where thread identifier changed.
    ;--------------------------------------------------------------

    mov ebx, -1
    xor edi, edi
    mov esi, pBuff
  @@:
    add ebx, 1
    cmp ebx, 10000000
    ja  @F
    mov eax, [esi+ebx*8]
    cmp eax, edi
    je  @B
    mov edi, eax
    invoke crt_printf,CTXT("%d",9), eax
    mov eax, [esi+ebx*8+4]
    invoke crt_printf,CTXT("%d",10), eax
    jmp @B
  @@:
    inkey
    exit
;==============================================================================
end start
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 14, 2013, 03:38:02 PM
Celeron M, CPU frequency: 1600 [MHz]
Timer: MinimumResolution : 0.015625 s -> clocks/quantum: 8.33333E+006
Timer: MaximumResolution : 0.001 s -> clocks/quantum: 533333
Timer: ActualResolution : 0.015625 s -> clocks/quantum: 8.33333E+006


AMD Athlon(tm) Dual Core Processor 4450B, CPU frequency: 2304 [MHz]
Timer: MinimumResolution : 0.0156001 s -> clocks/quantum: 1.19809E+007
Timer: MaximumResolution : 0.0005 s -> clocks/quantum: 384000
Timer: ActualResolution : 0.0156 s -> clocks/quantum: 1.19808E+007


With my method I get 1.44E+6 clocks/slice for the Celeron and 3.5E+7 for the AMD...
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 14, 2013, 09:20:13 PM
Here is the further development of your idea, Michael :t

It obviously shows that for two concurrent CPU-greedy threads the timeslice is ~31 ms. For 6 threads I get inconsistent timeslice - obviously CPU time is not enough and the OS trying to provide bigger timeslice to some threads randomly.
Latest two...six "microseconds" values meant nothing because for first thread it is smaller due to its early termination (so, it took not full timeslice), for second thread there is no next value to substract from, so latest the microseconds value is a garbage.
If you set THREADS_COUNT to a higher value and sometimes get negative microseconds value - that's just because of TSC overflow - the code uses DWORD for TSC difference saving. That's OK.

To get timeslice in clocks - just substract the required_thread+1 and required_thread TSC values.

You may play with THREADS_COUNT equate, for example set it to 8, comment process affinity setup and run prog on 4 core machine :biggrin:

2 threads

Commiting the memory loops... 3 2 1
Starting threads...
Thread #1 started and waiting for synchronization flag...
Thread #2 started and waiting for synchronization flag...
All threads started and ready to work, flagging them...
ThrId: 1, Tsc: 1641942104 (31111.889036 microseconds)
ThrId: 2, Tsc: 1708313408 (31307.562426 microseconds)
ThrId: 1, Tsc: 1775102144 (31197.697475 microseconds)
ThrId: 2, Tsc: 1841656504 (31278.417108 microseconds)
ThrId: 1, Tsc: 1908383064 (31223.175253 microseconds)
ThrId: 2, Tsc: 1974991776 (32716.561587 microseconds)
ThrId: 1, Tsc: 2044786344 (29788.938317 microseconds)
ThrId: 2, Tsc: 2108335384 (33655.721859 microseconds)
ThrId: 1, Tsc: 2180133472 (28849.669293 microseconds)
ThrId: 2, Tsc: 2241678760 (31300.099845 microseconds)
ThrId: 1, Tsc: 2308451576 (31212.060132 microseconds)
ThrId: 2, Tsc: 2375036576 (31276.298335 microseconds)
ThrId: 1, Tsc: 2441758616 (31215.536420 microseconds)
ThrId: 2, Tsc: 2508351032 (31267.174485 microseconds)
ThrId: 1, Tsc: 2575053608 (31237.901665 microseconds)
ThrId: 2, Tsc: 2641693736 (31255.853111 microseconds)
ThrId: 1, Tsc: 2708372160 (31245.889252 microseconds)
ThrId: 2, Tsc: 2775029328 (31276.189583 microseconds)
ThrId: 1, Tsc: 2841751136 (18778.881644 microseconds)
ThrId: 2, Tsc: 2881812312 (-1350864.296328 microseconds)
Press any key to continue ...


3 threads

Commiting the memory loops... 3 2 1
Starting threads...
Thread #1 started and waiting for synchronization flag...
Thread #2 started and waiting for synchronization flag...
Thread #3 started and waiting for synchronization flag...
All threads started and ready to work, flagging them...
ThrId: 1, Tsc: 1901983592 (31258.440639 microseconds)
ThrId: 2, Tsc: 1968667536 (31243.706728 microseconds)
ThrId: 3, Tsc: 2035320048 (31251.514313 microseconds)
ThrId: 1, Tsc: 2101989216 (31254.746849 microseconds)
ThrId: 2, Tsc: 2168665280 (31248.443030 microseconds)
ThrId: 3, Tsc: 2235327896 (31251.956818 microseconds)
ThrId: 1, Tsc: 2301998008 (31254.038091 microseconds)
ThrId: 2, Tsc: 2368672560 (31251.694315 microseconds)
ThrId: 3, Tsc: 2435342112 (31251.371812 microseconds)
ThrId: 1, Tsc: 2502010976 (31252.346823 microseconds)
ThrId: 2, Tsc: 2568681920 (31250.186799 microseconds)
ThrId: 3, Tsc: 2635348256 (31252.196821 microseconds)
ThrId: 1, Tsc: 2702018880 (31251.971818 microseconds)
ThrId: 2, Tsc: 2768689024 (31250.891807 microseconds)
ThrId: 3, Tsc: 2835356864 (31252.076820 microseconds)
ThrId: 1, Tsc: 2902027232 (31253.385584 microseconds)
ThrId: 2, Tsc: 2968700392 (32028.237809 microseconds)
ThrId: 3, Tsc: 3037026552 (30476.185833 microseconds)
ThrId: 1, Tsc: 3102041704 (31300.549850 microseconds)
ThrId: 2, Tsc: 3168815480 (31437.782601 microseconds)
ThrId: 3, Tsc: 3235882016 (31016.742996 microseconds)
ThrId: 1, Tsc: 3302050344 (31252.950579 microseconds)
ThrId: 2, Tsc: 3368722576 (31482.340588 microseconds)
ThrId: 3, Tsc: 3435884168 (31020.080532 microseconds)
ThrId: 1, Tsc: 3502059616 (31252.913079 microseconds)
ThrId: 2, Tsc: 3568731768 (31303.572383 microseconds)
ThrId: 3, Tsc: 3635511992 (31198.237481 microseconds)
ThrId: 1, Tsc: 3702067504 (90.942245 microseconds)
ThrId: 2, Tsc: 3702261512 (231.943787 microseconds)
ThrId: 3, Tsc: 3702756320 (-1735686.009066 microseconds)
Press any key to continue ...


6 threads

Commiting the memory loops... 3 2 1
Starting threads...
Thread #1 started and waiting for synchronization flag...
Thread #2 started and waiting for synchronization flag...
Thread #3 started and waiting for synchronization flag...
Thread #4 started and waiting for synchronization flag...
Thread #5 started and waiting for synchronization flag...
Thread #6 started and waiting for synchronization flag...
All threads started and ready to work, flagging them...
ThrId: 1, Tsc: 3286510888 (156269.421697 microseconds)
ThrId: 2, Tsc: 3619882008 (124958.236731 microseconds)
ThrId: 3, Tsc: 3886456664 (156258.749080 microseconds)
ThrId: 4, Tsc: 4219805016 (-1950782.957939 microseconds)
ThrId: 5, Tsc: 58180224 (93757.269220 microseconds)
ThrId: 6, Tsc: 258193544 (156256.551556 microseconds)
ThrId: 1, Tsc: 591537208 (156425.453403 microseconds)
ThrId: 2, Tsc: 925241192 (124841.666706 microseconds)
ThrId: 3, Tsc: 1191567168 (156258.910332 microseconds)
ThrId: 4, Tsc: 1524915864 (62505.642405 microseconds)
ThrId: 5, Tsc: 1658259776 (93755.356699 microseconds)
ThrId: 6, Tsc: 1858269016 (156258.332826 microseconds)
ThrId: 1, Tsc: 2191616480 (156412.215759 microseconds)
ThrId: 2, Tsc: 2525292224 (124854.885600 microseconds)
ThrId: 3, Tsc: 2791646400 (156258.899082 microseconds)
ThrId: 4, Tsc: 3124995072 (62570.289363 microseconds)
ThrId: 5, Tsc: 3258476896 (93810.744805 microseconds)
ThrId: 6, Tsc: 3458604296 (156137.795257 microseconds)
ThrId: 1, Tsc: 3791694616 (156416.573306 microseconds)
ThrId: 2, Tsc: 4125379656 (-1888438.237293 microseconds)
ThrId: 3, Tsc: 96755480 (156259.544089 microseconds)
ThrId: 4, Tsc: 430105528 (62504.772396 microseconds)
ThrId: 5, Tsc: 563447584 (93863.631633 microseconds)
ThrId: 6, Tsc: 763687808 (156213.358584 microseconds)
ThrId: 1, Tsc: 1096939328 (156356.036394 microseconds)
ThrId: 2, Tsc: 1430495224 (124848.090526 microseconds)
ThrId: 3, Tsc: 1696834904 (156259.094084 microseconds)
ThrId: 4, Tsc: 2030183992 (62504.907397 microseconds)
ThrId: 5, Tsc: 2163526336 (93755.184197 microseconds)
ThrId: 6, Tsc: 2363535208 (156259.484088 microseconds)
ThrId: 1, Tsc: 2696885128 (156425.854658 microseconds)
ThrId: 2, Tsc: 3030589968 (124841.846708 microseconds)
ThrId: 3, Tsc: 3296916328 (156430.245956 microseconds)
ThrId: 4, Tsc: 3630630536 (62331.666753 microseconds)
ThrId: 5, Tsc: 3763603304 (93756.605463 microseconds)
ThrId: 6, Tsc: 3963615208 (-1857030.312519 microseconds)
ThrId: 1, Tsc: 1993872 (156520.648195 microseconds)
ThrId: 2, Tsc: 335900936 (124745.639405 microseconds)
ThrId: 3, Tsc: 602022056 (156259.457838 microseconds)
ThrId: 4, Tsc: 935371920 (62588.117058 microseconds)
ThrId: 5, Tsc: 1068891776 (93672.229540 microseconds)
ThrId: 6, Tsc: 1268723680 (156258.235324 microseconds)
ThrId: 1, Tsc: 1602070936 (156261.816614 microseconds)
ThrId: 2, Tsc: 1935425832 (125005.273495 microseconds)
ThrId: 3, Tsc: 2202100832 (156258.955332 microseconds)
ThrId: 4, Tsc: 2535449624 (66547.751616 microseconds)
ThrId: 5, Tsc: 2677416608 (89727.381393 microseconds)
ThrId: 6, Tsc: 2868832928 (156328.481093 microseconds)
ThrId: 1, Tsc: 3202330040 (156175.813173 microseconds)
ThrId: 2, Tsc: 3535501464 (125005.993503 microseconds)
ThrId: 3, Tsc: 3802178000 (156260.931604 microseconds)
ThrId: 4, Tsc: 4135531008 (62531.975193 microseconds)
ThrId: 5, Tsc: 4268931096 (-1919560.476443 microseconds)
ThrId: 6, Tsc: 173913536 (156507.905555 microseconds)
ThrId: 1, Tsc: 507793416 (124759.199554 microseconds)
ThrId: 2, Tsc: 773943464 (62501.903615 microseconds)
ThrId: 3, Tsc: 907279400 (62590.052079 microseconds)
ThrId: 4, Tsc: 1040803384 (2384.928585 microseconds)
ThrId: 5, Tsc: 1045891176 (87.525957 microseconds)
ThrId: 6, Tsc: 1046077896 (-490354.377001 microseconds)
Press any key to continue ...



TIMES_TO_SWITCH EQU 10
THREADS_COUNT EQU 2

;==============================================================================
include \masm32\include\masm32rt.inc
.686
;==============================================================================
.data
    hThread1 dd 0
    hThread2 dd 0
    pBuff    dd 0
    nextThr  dd 0
    lok      dd 1
.code


ThrCtl struct DWORD
thrId       dd  ?
switchCnt   dd  ?
ThrCtl ends


;==============================================================================
ThreadProc1 proc uses ebx esi edi lpParameter:ptr ThrCtl

mov esi,lpParameter

mov ecx,[esi].ThrCtl.thrId
mov ebx,pBuff
lea edi,[ecx*8-8]

push ecx
invoke crt_printf,CTXT("Thread #%d started and waiting for synchronization flag...",10),ecx
pop ecx

lock inc nextThr

.while lok
.endw


    .WHILE [esi].ThrCtl.switchCnt

        @loop1:
            cmp ecx,nextThr
        jnz @loop1
       
        xor eax, eax
        push ebx
        push ecx
        cpuid
        pop ecx
        pop ebx
        rdtsc
        mov DWORD PTR [ebx+edi], ecx
        mov DWORD PTR [ebx+edi+4], eax
        lea edi,[edi+THREADS_COUNT*8]

        lea eax,[ecx+1]
        cmp eax,THREADS_COUNT
        jbe @F
        mov eax,1
        @@:

        dec [esi].ThrCtl.switchCnt
        mov nextThr,eax
       
    .ENDW
    ret
ThreadProc1 endp
;==============================================================================

start proc
;==============================================================================
LOCAL hthrs[THREADS_COUNT]:DWORD
LOCAL tdd:DWORD

    invoke GetCurrentProcess
    invoke SetProcessAffinityMask, eax, 1

    ;--------------------------------------------------------------------------
    ; Minimize interruptions (THESE PRIORITY LEVELS NOT SAFE FOR SINGLE CORE).
    ;--------------------------------------------------------------------------

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, REALTIME_PRIORITY_CLASS
    invoke GetCurrentThread
    invoke SetThreadPriority, eax, THREAD_PRIORITY_TIME_CRITICAL

    ;-----------------------------------------------------------
    ; Buffer entries will be a dword thread identifier (1 or 2)
    ; followed by the low-order dword of the current TSC.
    ;-----------------------------------------------------------

    mov ebx,TIMES_TO_SWITCH*THREADS_COUNT*8+8

    invoke GlobalAlloc,GMEM_ZEROINIT,ebx
    mov pBuff,eax

    invoke crt_printf,CTXT("Commiting the memory loops...")

    mov esi,3
    @@:
    invoke crt_printf,CTXT(" %lu"),esi
    mov edi,pBuff
    mov ecx,ebx
    shr ecx,2
    xor eax,eax
    rep stosd
    dec esi
    jnz @B

    invoke crt_printf,CTXT(10,"Starting threads...",10)
   
    ;mov esi,THREADS_COUNT
    xor esi,esi
    @@:
    push TIMES_TO_SWITCH
    lea ecx,[esi+1]
    push ecx
    mov edx,esp
    invoke CreateThread, NULL, 0, ThreadProc1, edx, 0, addr tdd
    mov hthrs[esi*4],eax
    inc esi
    cmp esi,THREADS_COUNT
    jb @B

    @@:
    invoke Sleep,10
    cmp nextThr,THREADS_COUNT
    jb @B
    invoke crt_printf,CTXT("All threads started and ready to work, flagging them...",10)

    mov nextThr,1
   
    and lok,0

    invoke WaitForMultipleObjects, THREADS_COUNT, ADDR hthrs, TRUE, INFINITE

    ;--------------------------------------------------------------
    ; Display only buffer entries where thread identifier changed.
    ;--------------------------------------------------------------

    push eax
    push eax
    invoke QueryPerformanceFrequency,esp
    fild qword ptr [esp]
    mov dword ptr [esp],1000000
    fild dword ptr [esp]
    fdivp st(1),st(0)
       
   
    mov ebx, -1
    xor edi, edi
    mov esi, pBuff
  @@:
    and dword ptr [esp+4],0
    mov eax,[esi+edi+8+4]
    mov [esp],eax
    fild qword ptr [esp]
    mov eax,[esi+edi+4]
    mov [esp],eax
    fild qword ptr [esp]
    fsubp st(1),st(0)
    fdiv st(0),st(1)
    fstp qword ptr [esp]       
    invoke crt_printf,CTXT("ThrId: %d, Tsc: %lu (%lf microseconds)",10),dword ptr [esi+edi],dword ptr [esi+edi+4]
    add edi,8
    cmp edi,TIMES_TO_SWITCH*THREADS_COUNT*8
    jb @B
   
  @@:
    pop eax
    pop eax
   
    inkey
    exit
;==============================================================================
start endp
end start



Michael, your idea is very good :t
Title: Re: SwitchToThread vs Sleep(0)
Post by: qWord on May 14, 2013, 10:47:52 PM
Quote from: Antariy on May 14, 2013, 01:55:00 PM
I have some doubts that the timer resolution has any relation to the scheduler work
your absolute right, it sounds more than unrealistic...

BTW: http://technet.microsoft.com/en-us/sysinternals/bb963901, Chapter 5 is available as book preview: http://download.microsoft.com/download/1/4/0/14045A9E-C978-47D1-954B-92B9FD877995/97807356648739_SampleChapters.pdf
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 14, 2013, 11:17:17 PM
Quote from: Antariy on May 14, 2013, 09:20:13 PM
Michael, your idea is very good :t

I can echo that one :t

My "catch the outlier" approach works but it's not very practical. What I can observe, though, is that inside a timeslice you encounter about 1-3 outliers, i.e. cycle counts that are higher than expected but not caused by a context switch (which has a much higher cycle count, e.g. 100*). These outliers might be hardware interrupts, but I am not sure about that.

From qWord's link:
QuoteA thread might not get to complete its quantum, however, because Windows implements a preemptive scheduler: if another thread with a higher priority becomes ready to run, the currently running thread might be preempted before finishing its time slice. In fact, a thread can be selected to run next and be preempted before even beginning its quantum!

That may explain occasional shorter time slices but not the outliers mentioned above.
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 14, 2013, 11:29:01 PM
Update.

To see if the Sleep(0) works, one can define DROP_TIMESLICE equation, and then we see that for every thread the timeslice is obviously dropped.


Commiting the memory loops... 3 2 1
Starting threads...
Thread #1 started and waiting for synchronization flag...
Thread #2 started and waiting for synchronization flag...
Thread #3 started and waiting for synchronization flag...
Thread #4 started and waiting for synchronization flag...
Thread #5 started and waiting for synchronization flag...
Thread #6 started and waiting for synchronization flag...
All threads started and ready to work, flagging them...
ThrId: 1, Tsc: 3973525984 (15160 clocks, 7.106328 microseconds)
ThrId: 2, Tsc: 3973541144 (14608 clocks, 6.847575 microseconds)
ThrId: 3, Tsc: 3973555752 (15240 clocks, 7.143828 microseconds)
ThrId: 4, Tsc: 3973570992 (14528 clocks, 6.810074 microseconds)
ThrId: 5, Tsc: 3973585520 (14568 clocks, 6.828825 microseconds)
ThrId: 6, Tsc: 3973600088 (14576 clocks, 6.832575 microseconds)
ThrId: 1, Tsc: 3973614664 (14600 clocks, 6.843825 microseconds)
ThrId: 2, Tsc: 3973629264 (14456 clocks, 6.776324 microseconds)
ThrId: 3, Tsc: 3973643720 (14584 clocks, 6.836325 microseconds)
ThrId: 4, Tsc: 3973658304 (14592 clocks, 6.840075 microseconds)
ThrId: 5, Tsc: 3973672896 (14568 clocks, 6.828825 microseconds)
ThrId: 6, Tsc: 3973687464 (14544 clocks, 6.817575 microseconds)
ThrId: 1, Tsc: 3973702008 (14520 clocks, 6.806324 microseconds)
ThrId: 2, Tsc: 3973716528 (14456 clocks, 6.776324 microseconds)
ThrId: 3, Tsc: 3973730984 (14576 clocks, 6.832575 microseconds)
ThrId: 4, Tsc: 3973745560 (14632 clocks, 6.858825 microseconds)
ThrId: 5, Tsc: 3973760192 (14576 clocks, 6.832575 microseconds)
ThrId: 6, Tsc: 3973774768 (14560 clocks, 6.825075 microseconds)
ThrId: 1, Tsc: 3973789328 (14528 clocks, 6.810074 microseconds)
ThrId: 2, Tsc: 3973803856 (14456 clocks, 6.776324 microseconds)
ThrId: 3, Tsc: 3973818312 (14576 clocks, 6.832575 microseconds)
ThrId: 4, Tsc: 3973832888 (14632 clocks, 6.858825 microseconds)
ThrId: 5, Tsc: 3973847520 (14576 clocks, 6.832575 microseconds)
ThrId: 6, Tsc: 3973862096 (14560 clocks, 6.825075 microseconds)
ThrId: 1, Tsc: 3973876656 (14568 clocks, 6.828825 microseconds)
ThrId: 2, Tsc: 3973891224 (14440 clocks, 6.768824 microseconds)
ThrId: 3, Tsc: 3973905664 (14552 clocks, 6.821325 microseconds)
ThrId: 4, Tsc: 3973920216 (14632 clocks, 6.858825 microseconds)
ThrId: 5, Tsc: 3973934848 (14576 clocks, 6.832575 microseconds)
ThrId: 6, Tsc: 3973949424 (14560 clocks, 6.825075 microseconds)
ThrId: 1, Tsc: 3973963984 (14528 clocks, 6.810074 microseconds)
ThrId: 2, Tsc: 3973978512 (14456 clocks, 6.776324 microseconds)
ThrId: 3, Tsc: 3973992968 (14576 clocks, 6.832575 microseconds)
ThrId: 4, Tsc: 3974007544 (14632 clocks, 6.858825 microseconds)
ThrId: 5, Tsc: 3974022176 (14576 clocks, 6.832575 microseconds)
ThrId: 6, Tsc: 3974036752 (14560 clocks, 6.825075 microseconds)
ThrId: 1, Tsc: 3974051312 (14528 clocks, 6.810074 microseconds)
ThrId: 2, Tsc: 3974065840 (14456 clocks, 6.776324 microseconds)
ThrId: 3, Tsc: 3974080296 (14576 clocks, 6.832575 microseconds)
ThrId: 4, Tsc: 3974094872 (14632 clocks, 6.858825 microseconds)
ThrId: 5, Tsc: 3974109504 (14592 clocks, 6.840075 microseconds)
ThrId: 6, Tsc: 3974124096 (14568 clocks, 6.828825 microseconds)
ThrId: 1, Tsc: 3974138664 (14536 clocks, 6.813825 microseconds)
ThrId: 2, Tsc: 3974153200 (14480 clocks, 6.787574 microseconds)
ThrId: 3, Tsc: 3974167680 (14608 clocks, 6.847575 microseconds)
ThrId: 4, Tsc: 3974182288 (14544 clocks, 6.817575 microseconds)
ThrId: 5, Tsc: 3974196832 (14568 clocks, 6.828825 microseconds)
ThrId: 6, Tsc: 3974211400 (14576 clocks, 6.832575 microseconds)
ThrId: 1, Tsc: 3974225976 (14520 clocks, 6.806324 microseconds)
ThrId: 2, Tsc: 3974240496 (14472 clocks, 6.783824 microseconds)
ThrId: 3, Tsc: 3974254968 (14576 clocks, 6.832575 microseconds)
ThrId: 4, Tsc: 3974269544 (14632 clocks, 6.858825 microseconds)
ThrId: 5, Tsc: 3974284176 (14560 clocks, 6.825075 microseconds)
ThrId: 6, Tsc: 3974298736 (14552 clocks, 6.821325 microseconds)
ThrId: 1, Tsc: 3974313288 (188328 clocks, 88.279716 microseconds)
ThrId: 2, Tsc: 3974501616 (81648 clocks, 38.272919 microseconds)
ThrId: 3, Tsc: 3974583264 (108864 clocks, 51.030558 microseconds)
ThrId: 4, Tsc: 3974692128 (66280 clocks, 31.069090 microseconds)
ThrId: 5, Tsc: 3974758408 (123112 clocks, 57.709381 microseconds)
ThrId: 6, Tsc: 3974881520 (2147483648 clocks, -1863246.091754 microseconds)
Press any key to continue ...



In the archive the "default" settings compiled program - threads count is 2, do not drop timeslice. Interesting how will it perform on x64 OSes.

Results for it is

Commiting the memory loops... 3 2 1
Starting threads...
Thread #1 started and waiting for synchronization flag...
Thread #2 started and waiting for synchronization flag...
All threads started and ready to work, flagging them...
ThrId: 1, Tsc: 3190639352 (66646168 clocks, 31240.732946 microseconds)
ThrId: 2, Tsc: 3257285520 (66680104 clocks, 31256.640620 microseconds)
ThrId: 1, Tsc: 3323965624 (66655280 clocks, 31245.004242 microseconds)
ThrId: 2, Tsc: 3390620904 (66685200 clocks, 31259.029396 microseconds)
ThrId: 1, Tsc: 3457306104 (66683048 clocks, 31258.020635 microseconds)
ThrId: 2, Tsc: 3523989152 (66653304 clocks, 31244.077982 microseconds)
ThrId: 1, Tsc: 3590642456 (66661424 clocks, 31247.884274 microseconds)
ThrId: 2, Tsc: 3657303880 (66683016 clocks, 31258.005634 microseconds)
ThrId: 1, Tsc: 3723986896 (66689880 clocks, 31261.223170 microseconds)
ThrId: 2, Tsc: 3790676776 (66651008 clocks, 31243.001720 microseconds)
ThrId: 1, Tsc: 3857327784 (66654632 clocks, 31244.700489 microseconds)
ThrId: 2, Tsc: 3923982416 (66700200 clocks, 31266.060723 microseconds)
ThrId: 1, Tsc: 3990682616 (66639888 clocks, 31237.789163 microseconds)
ThrId: 2, Tsc: 4057322504 (66684024 clocks, 31258.478140 microseconds)
ThrId: 1, Tsc: 4124006528 (66654648 clocks, 31244.707989 microseconds)
ThrId: 2, Tsc: 4190661176 (83116016 clocks, 38961.058637 microseconds)
ThrId: 1, Tsc: 4273777192 (2147483648 clocks, -1989743.356568 microseconds)
ThrId: 2, Tsc: 29037792 (66760952 clocks, 31294.538534 microseconds)
ThrId: 1, Tsc: 95798744 (189608 clocks, 88.879722 microseconds)
ThrId: 2, Tsc: 95988352 (4198978944 clocks, -44995.032133 microseconds)
Press any key to continue ...


Now TSC (still DWORD sized), clocks and milliseconds for a thread timeslice are shown.

Request for a test, please :biggrin:



Quote from: qWord on May 14, 2013, 10:47:52 PM
Quote from: Antariy on May 14, 2013, 01:55:00 PM
I have some doubts that the timer resolution has any relation to the scheduler work
your absolute right, it sounds more than unrealistic...

BTW: http://technet.microsoft.com/en-us/sysinternals/bb963901, Chapter 5 is available as book preview: http://download.microsoft.com/download/1/4/0/14045A9E-C978-47D1-954B-92B9FD877995/97807356648739_SampleChapters.pdf

Thank you for the information and the file, qWord! :t (I did not have a chance to check them right now, but will check later.)


Quote from: jj2007 on May 14, 2013, 11:17:17 PM
Quote from: Antariy on May 14, 2013, 09:20:13 PM
Michael, your idea is very good :t

I can echo that one :t

My "catch the outlier" approach works but it's not very practical. What I can observe, though, is that inside a timeslice you encounter about 1-3 outliers, i.e. cycle counts that are higher than expected but not caused by a context switch (which has a much higher cycle count, e.g. 100*). These outliers might be hardware interrupts, but I am not sure about that.

I think you're right, Jochen :biggrin: Probably it has something to do with an OS "games" :biggrin:
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 14, 2013, 11:38:25 PM
Added an archive to a previous post. Had forgotten to do it when posted ::)
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 14, 2013, 11:41:21 PM
Commiting the memory loops... 3 2 1
Starting threads...
Thread #1 started and waiting for synchronization flag...
Thread #2 started and waiting for synchronization flag...
All threads started and ready to work, flagging them...
ThrId: 1, Tsc: 40662020 (73754145 clocks, 2950165.800000 microseconds)
ThrId: 2, Tsc: 114416165 (73754112 clocks, 2950164.480000 microseconds)
ThrId: 1, Tsc: 188170277 (73752661 clocks, 2950106.440000 microseconds)
ThrId: 2, Tsc: 261922938 (73797654 clocks, 2951906.160000 microseconds)
ThrId: 1, Tsc: 335720592 (73710676 clocks, 2948427.040000 microseconds)
ThrId: 2, Tsc: 409431268 (73756159 clocks, 2950246.360000 microseconds)
ThrId: 1, Tsc: 483187427 (73751858 clocks, 2950074.320000 microseconds)
ThrId: 2, Tsc: 556939285 (73755018 clocks, 2950200.720000 microseconds)
ThrId: 1, Tsc: 630694303 (73752243 clocks, 2950089.720000 microseconds)
ThrId: 2, Tsc: 704446546 (73753786 clocks, 2950151.440000 microseconds)
ThrId: 1, Tsc: 778200332 (73754262 clocks, 2950170.480000 microseconds)
ThrId: 2, Tsc: 851954594 (73752758 clocks, 2950110.320000 microseconds)
ThrId: 1, Tsc: 925707352 (73767315 clocks, 2950692.600000 microseconds)
ThrId: 2, Tsc: 999474667 (73741593 clocks, 2949663.720000 microseconds)
ThrId: 1, Tsc: 1073216260 (73752790 clocks, 2950111.600000 microseconds)
ThrId: 2, Tsc: 1146969050 (73754554 clocks, 2950182.160000 microseconds)
ThrId: 1, Tsc: 1220723604 (73753815 clocks, 2950152.600000 microseconds)
ThrId: 2, Tsc: 1294477419 (73753211 clocks, 2950128.440000 microseconds)
ThrId: 1, Tsc: 1368230630 (90926 clocks, 3637.040000 microseconds)
ThrId: 2, Tsc: 1368321556 (2926645740 clocks, -54732862.240000 microseconds)
Title: Re: SwitchToThread vs Sleep(0)
Post by: FORTRANS on May 15, 2013, 12:14:16 AM
CPU frequency: 801 [MHz]
Timer: MinimumResolution : 0.0100144 -> clocks/quantum: 2.67384E+006
Timer: MaximumResolution : 0.0010032 -> clocks/quantum: 267854
Timer: ActualResolution : 0.0100144 -> clocks/quantum: 2.67384E+006
Press any key to continue ...
Title: Re: SwitchToThread vs Sleep(0)
Post by: qWord on May 15, 2013, 12:45:29 AM
I'm really curions about some reults here (or I'm not able to intpret them?):
Quote from: jj2007 on May 14, 2013, 11:41:21 PMThrId: 1, Tsc: 40662020 (73754145 clocks, 2950165.800000 microseconds)
--> 2.95 seconds per time slice / quantum? The same strange results for my i7: 32 seconds per quantum  :dazzled:

The following quote is from the above linked book "Windows Internals, Sixth Edition, Microsoft Press, Chapter 5: Processes, Threads, and Jobs" (this chapter is available as book preview):
QuoteQuantum
As mentioned earlier in the chapter, a quantum is the amount of time a thread gets to run before
Windows checks to see whether another thread at the same priority is waiting to run. If a thread
completes its quantum and there are no other threads at its priority, Windows permits the thread to
run for another quantum.
On client versions of Windows, threads run by default for 2 clock intervals; on server systems, by
default, a thread runs for 12 clock intervals. (We'll explain how you can change these values later.) The
rationale for the longer default value on server systems is to minimize context switching. By having
a longer quantum, server applications that wake up as the result of a client request have a better
chance of completing the request and going back into a wait state before their quantum ends.
The length of the clock interval varies according to the hardware platform. The frequency of the
clock interrupts is up to the HAL, not the kernel. For example, the clock interval for most x86 uniprocessors
is about 10 milliseconds (note that these machines are no longer supported by Windows and
are only used here for example purposes), and for most x86 and x64 multiprocessors it is about 15
milliseconds.
This clock interval is stored in the kernel variable KeMaximumIncrement as hundreds of
nanoseconds.
Because thread run-time accounting is based on processor cycles, although threads still run in
units of clock intervals, the system does not use the count of clock ticks as the deciding factor for
how long a thread has run and whether its quantum has expired. Instead, when the system starts up,
a calculation
is made whose result is the number of clock cycles that each quantum is equivalent to
.
(This value is stored in the kernel variable KiCyclesPerClockQuantum.) This calculation is made by multiplying
the processor speed in Hz (CPU clock cycles per second) with the number of seconds it takes
for one clock tick to fire (based on the KeMaximumIncrement value described earlier).
The result of this accounting method is that threads do not actually run for a quantum number
based on clock ticks; they instead run for a quantum target, which represents an estimate of what the
number of CPU clock cycles the thread has consumed should be when its turn would be given up.
This target should be equal to an equivalent number of clock interval timer ticks because, as you just
saw, the calculation of clock cycles per quantum is based on the clock interval timer frequency, which
you can check using the following experiment. On the other hand, because interrupt cycles are not
charged to the thread, the actual clock time might be longer.
[...]
Internally, a quantum unit is represented as one third of a clock tick. (So one clock tick equals three
quantums.) This means that on client Windows systems, threads, by default, have a quantum reset
value
of 6 (2 * 3), and that server systems have a quantum reset value of 36 (12 * 3). For this reason,
the KiCyclesPerClockQuantum value is divided by three at the end of the calculation previously described,
because the original value describes only CPU clock cycles per clock interval timer tick.

qWord
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 15, 2013, 01:49:23 AM
Quote from: qWord on May 15, 2013, 12:45:29 AM
I'm really curions about some reults here (or I'm not able to intpret them?):
Quote from: jj2007 on May 14, 2013, 11:41:21 PMThrId: 1, Tsc: 40662020 (73754145 clocks, 2950165.800000 microseconds)
--> 2.95 seconds per time slice / quantum? The same strange results for my i7: 32 seconds

73,754,145 clocks is about 30ms. Probably a comma error somewhere.
Title: Re: SwitchToThread vs Sleep(0)
Post by: qWord on May 15, 2013, 02:01:59 AM
Quote from: jj2007 on May 15, 2013, 01:49:23 AM73,754,145 clocks is about 30ms. Probably a comma error somewhere.
I think the the problem is that the code uses QueryPerformanceFrequency to convert the TSC value.
Title: Re: SwitchToThread vs Sleep(0)
Post by: qWord on May 15, 2013, 04:02:39 AM
I've also found an other interesting fact:
Quote from: msdn: timeBeginPeriod functionThis function affects a global Windows setting. Windows uses the lowest value (that is, highest resolution) requested by any process. Setting a higher resolution can improve the accuracy of time-out intervals in wait functions. However, it can also reduce overall system performance, because the thread scheduler switches tasks more often. High resolutions can also prevent the CPU power management system from entering power-saving modes

After closing all running applications, the timer resolution on my machine decrease by the factor 10 (from 1ms to 10ms), which means that the quantum length increase by factor 10:
QuoteCPU frequency: 2294 [MHz]
Timer: MinimumResolution : 0.0156001 s -> clocks/quantum: 1.19289E+007
Timer: MaximumResolution : 0.0005 s -> clocks/quantum: 382333
Timer: ActualResolution : 0.01 s -> clocks/quantum: 7.64667E+006
Press any key to continue ...
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 15, 2013, 04:45:26 AM
i think the real answer is....
the length of a timeslice varies, with a number of contributing factors
not the least of which is the OS version
it seems the scheduler underwent major changes with each version of windows

that's not to say that it can't be discovered and utilized
i would say that, if you can run a test in less than the length of a quantum, you might get more stable results

the variations in results seem to be due to using CPUID to serialize instructions
at least, that's my take on it
now - it may be that the very nature of serializing instructions accounts for the variations in CPUID - lol
sounds like a chicken and egg thing

i have some ideas on serialization that don't involve CPUID
maybe i'll write some test code and post   :biggrin:
give me a few days to finish up this current project
Title: Re: SwitchToThread vs Sleep(0)
Post by: MichaelW on May 15, 2013, 08:13:40 AM
By running at the highest possible priority you can at least minimize the effects of context switches. And although I have no hardware to test this on, I suspect that on a system with multiple physical cores, and with your test confined to a single core, you can practically eliminate the effects of context switches.

Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 15, 2013, 08:59:45 AM
Quote from: MichaelW on May 15, 2013, 08:13:40 AM... , you can practically eliminate the effects of context switches.

The best option would be to eliminate the context switches themselves. Your two-thread code provides a way to calculate how many test loops are needed to get timings just before the next context switch (with many extra cycles) happens.
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 15, 2013, 12:38:18 PM
Quote from: qWord on May 15, 2013, 12:45:29 AM
I'm really curions about some reults here (or I'm not able to intpret them?):
Quote from: jj2007 on May 14, 2013, 11:41:21 PMThrId: 1, Tsc: 40662020 (73754145 clocks, 2950165.800000 microseconds)
--> 2.95 seconds per time slice / quantum? The same strange results for my i7: 32 seconds per quantum  :dazzled:

The following quote is from the above linked book "Windows Internals, Sixth Edition, Microsoft Press, Chapter 5: Processes, Threads, and Jobs" (this chapter is available as book preview):


Quote from: jj2007 on May 15, 2013, 01:49:23 AM
Quote from: qWord on May 15, 2013, 12:45:29 AM
I'm really curions about some reults here (or I'm not able to intpret them?):
Quote from: jj2007 on May 14, 2013, 11:41:21 PMThrId: 1, Tsc: 40662020 (73754145 clocks, 2950165.800000 microseconds)
--> 2.95 seconds per time slice / quantum? The same strange results for my i7: 32 seconds

73,754,145 clocks is about 30ms. Probably a comma error somewhere.

Thank you for tests and notes! :biggrin:


Quote from: qWord on May 15, 2013, 02:01:59 AM
Quote from: jj2007 on May 15, 2013, 01:49:23 AM73,754,145 clocks is about 30ms. Probably a comma error somewhere.
I think the the problem is that the code uses QueryPerformanceFrequency to convert the TSC value.

Hmm... Interesting. Maybe some systems return not CPU freq but some scale value from this, thank you, qWord! :t Changed code now to calculate CPU freq at runtime, probably it will work better that relying on an API ::)


Quote from: qWord on May 15, 2013, 04:02:39 AM
I've also found an other interesting fact:
Quote from: msdn: timeBeginPeriod functionThis function affects a global Windows setting. Windows uses the lowest value (that is, highest resolution) requested by any process. Setting a higher resolution can improve the accuracy of time-out intervals in wait functions. However, it can also reduce overall system performance, because the thread scheduler switches tasks more often. High resolutions can also prevent the CPU power management system from entering power-saving modes

After closing all running applications, the timer resolution on my machine decrease by the factor 10 (from 1ms to 10ms), which means that the quantum length increase by factor 10:
QuoteCPU frequency: 2294 [MHz]
Timer: MinimumResolution : 0.0156001 s -> clocks/quantum: 1.19289E+007
Timer: MaximumResolution : 0.0005 s -> clocks/quantum: 382333
Timer: ActualResolution : 0.01 s -> clocks/quantum: 7.64667E+006
Press any key to continue ...

Strange, I tried to play with these functions (timeBeginPeriod and NtSetTimerResolution) yesterday (previous page), but with no any changement. Probably you're right and this needs to close all apps and maybe to run the prog under Admin privilege (did you run it under admin?).


Quote from: dedndave on May 15, 2013, 04:45:26 AM
the variations in results seem to be due to using CPUID to serialize instructions
at least, that's my take on it
now - it may be that the very nature of serializing instructions accounts for the variations in CPUID - lol
sounds like a chicken and egg thing

i have some ideas on serialization that don't involve CPUID
maybe i'll write some test code and post   :biggrin:
give me a few days to finish up this current project

Yes, Dave, it sounds so :biggrin: In the "Test of timing code" thread you saw that almost all problems of the timing frame were because of instability in CPUID timings - on some CPUs, like Jochen's is, it is very stable, but on most of others it has very high bias (relative to the code which runs just tents of clocks).

Interesting to see your idea :t




As for code attached. Well, I rewritten the timeslice "calculation" code to a stand-alone code, it now consists a two procs, and has no global vars to rely on, i.e. it now is reenterant (and may be runned, for example, in different threads which setup each to its own core) and reusable (copy-paste). Also added a conditional compilation flags to not output an info like "Thread #x started" etc etc - if anyone will want to use it silently in the progs, also it has a bool flag which, if the conditional compilation flag allows info output, may change the mode of info output to on/off at runtime.

Usage:


LOCAL pBuff:DWORD
...
    invoke calc_timeslice,TIMES_TO_SWITCH,THREADS_COUNT,addr pBuff,1
...
mov eax,pBuff
; EAX now points to a buffer with a data format of which is described below


First param - TIMES_TO_SWITCH param - sets how much times needs to switch the threads, do not set it to too high values, since if timeslice is ~31ms and there is 2 threads, for instance, then for 10 switches it will take at least ~620 ms to calculate the timeslices.
Second param - THREADS_COUNT - is a count of CPU-greedy concurrent threads to run in test. Best value is 2 - other may produce unstability due to high lack of a CPU time.
Third param is a pointer to a variable which after a call will hold an address of a buffer with a data having this format - array of structure elements:
ThreadIdentified  dd ?
TimeStampCounter dq ?

Yes, now the code uses full precision for TSC saving, so there should not be any problems with overflows.
TSC is a value that was at the time the thread got switched to, so if you want to get a timeslice in clocks - just substract TSC of the required_thread_id array entry from required_thread_id+1 array entry.
As we actually need not length of a timeslice in a seconds but rather in clocks - the data array consists from a clocks data, it doesn't convert it to a milliseconds - this done by a testing and displayment code in a "main" proc.
Fourth param is a BOOL value - if it's FALSE, then no info will be displayed at the time code runs.

The code doesn't set affinity and priority levels - that's for user part and maybe played a lot.


Also updated the displayment code - to work with a QWORDs for a TSC and TSC difference (it's excessive but OK).

Also changed CPU frequency getting code to a "handmade", in a displayment part of the prog.

Maybe also some changes, but most relevant are described.

Thoughts are welcome :t



Commiting the memory loops... 3 2 1
Starting threads...
Thread #1 started and waiting for synchronization flag...
Thread #2 started and waiting for synchronization flag...
All threads started and ready to work, flagging them...
Calculating CPU speed... (Sleep(2000) takes: 2000): ~2115181524 Hz
ThrId: 1, Tsc: DC58E182110 (66240824 clocks, 31316.851 microseconds)
ThrId: 2, Tsc: DC5920AE248 (67112224 clocks, 31728.825 microseconds)
ThrId: 1, Tsc: DC5960AEF68 (66228776 clocks, 31311.155 microseconds)
ThrId: 2, Tsc: DC599FD8190 (66749072 clocks, 31557.136 microseconds)
ThrId: 1, Tsc: DC59DF80420 (66594280 clocks, 31483.955 microseconds)
ThrId: 2, Tsc: DC5A1F02A08 (67835888 clocks, 32070.953 microseconds)
ThrId: 1, Tsc: DC5A5FB41F8 (65500152 clocks, 30966.681 microseconds)
ThrId: 2, Tsc: DC5A9E2B5F0 (67209024 clocks, 31774.589 microseconds)
ThrId: 1, Tsc: DC5ADE43D30 (66131200 clocks, 31265.023 microseconds)
ThrId: 2, Tsc: DC5B1D55230 (67680192 clocks, 31997.345 microseconds)
ThrId: 1, Tsc: DC5B5DE09F0 (65662680 clocks, 31043.520 microseconds)
ThrId: 2, Tsc: DC5B9C7F8C8 (67017584 clocks, 31684.082 microseconds)
ThrId: 1, Tsc: DC5BDC69438 (66318928 clocks, 31353.776 microseconds)
ThrId: 2, Tsc: DC5C1BA8688 (82939136 clocks, 39211.356 microseconds)
ThrId: 1, Tsc: DC5C6AC1388 (50481152 clocks, 23866.109 microseconds)
ThrId: 2, Tsc: DC5C9AE5B88 (67595328 clocks, 31957.223 microseconds)
ThrId: 1, Tsc: DC5CDB5C7C8 (65664232 clocks, 31044.254 microseconds)
ThrId: 2, Tsc: DC5D19FBCB0 (67142456 clocks, 31743.118 microseconds)
Press any key to continue ...

Title: Re: SwitchToThread vs Sleep(0)
Post by: qWord on May 15, 2013, 12:54:26 PM
Quote from: Antariy on May 15, 2013, 12:38:18 PMHmm... Interesting. Maybe some systems return not CPU freq but some scale value from this, thank you, qWord! :t Changed code now to calculate CPU freq at runtime, probably it will work better that relying on an API ::)
window's performance counters are implemented with the APIC (Advanced Programmable Interrupt Controller) - the frequency is independent from the CPU's freq..
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 15, 2013, 01:09:18 PM
Interesting to check how it behaves - which values we get - before and after the changement of a timer resolution.

Edited.

Quote from: qWord on May 15, 2013, 12:54:26 PM
Quote from: Antariy on May 15, 2013, 12:38:18 PMHmm... Interesting. Maybe some systems return not CPU freq but some scale value from this, thank you, qWord! :t Changed code now to calculate CPU freq at runtime, probably it will work better that relying on an API ::)
window's performance counters are implemented with the APIC (Advanced Programmable Interrupt Controller) - the frequency is independent from the CPU's freq..

Not always independed. Well, yes, my info is rusty :lol: Probably it is because of standard kernel (under kernel I mean "HAL" and "kernel" in a sheaf) on a single-core machine ::)


At least we probably can assume that the "default" 15,625 ms on NT's are more or less the maximum, better to fit the timings measurement to something like 10 ms.
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 15, 2013, 03:45:39 PM
Hi Alex,
mw blocked my one-core PC completely - I had to press power off for five seconds... :(
Title: Re: SwitchToThread vs Sleep(0)
Post by: MichaelW on May 15, 2013, 07:41:31 PM
Quote from: jj2007 on May 15, 2013, 03:45:39 PM
Hi Alex,
mw blocked my one-core PC completely - I had to press power off for five seconds... :(

I thought I included a warning. For my P3 I use HIGH_PRIORITY_CLASS and THREAD_PRIORITY_NORMAL or THREAD_PRIORITY_ABOVE_NORMAL, but for my P4 with HT I can max out the priority, no problems.
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 15, 2013, 10:47:09 PM
Quote from: jj2007 on May 15, 2013, 03:45:39 PM
Hi Alex,
mw blocked my one-core PC completely - I had to press power off for five seconds... :(

Sorry, Jochen :(
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 15, 2013, 10:57:56 PM
Jochen, did you try in such a circumstance to press and hold [Ctrl]+[C], this should terminate program in some seconds (10~20)? This will help if the console prog hangs, but if some OS service/something such crashed/hanged, then nothing will help. (This may sound strange or funny, but once I got a freeze when hold left mouse button on a page in an Acrobat Reader (version 7), slowly scrolling the page by holding it with a "hand". The CPU usage is very high at this moment - something strange happened to an OS.)
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 15, 2013, 11:02:02 PM
Quote from: Antariy on May 15, 2013, 10:57:56 PM
Jochen, did you try in such a circumstance to press and hold [Ctrl]+[C]...

Don't worry, Alex, I had no files open, and no data were lost.
Ctrl C was what I tried first, but it was really blocked completely - no mouse, no keyboard reaction.
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 17, 2013, 12:10:36 AM
well, it probably needs some fine-tuning, but this demonstrates my serialization concept

it's based on a "natural" serialization of the code stream
i.e., rather than using one of the serializing instructions (per intel),
force the code stream to serialize based on register content
    _serializ MACRO
        pushad
        pop     edi
        pop     esi
        pop     ebp
        pop     eax
        pop     ebx
        pop     edx
        pop     eax
        pop     ecx
        xchg    eax,ecx
    ENDM

the CPU can't perform out-of-order execution if it has to wait for the sequence to finish   :biggrin:
all registers are involved, so it has to wait
as a matter of coincidence, the sequence is completely benign - no registers or flags are modified
hopefully, the time it takes to execute our sequence is more stable/repeatable than CPUID

as i said, we need to do some fine-tuning
but, here is a sample run
i do still get outliers, in spite of the single-quantum execution
11 8 5 7 8 7 8 10 8 8
24 25 22 22 23 21 24 21 24 26
56 55 57 56 55 55 56 57 53 56


but - i think we have a nice starting place

EDIT: attachment updated - see reply #57
Title: Re: SwitchToThread vs Sleep(0)
Post by: FORTRANS on May 17, 2013, 02:44:17 AM
Hi,

   Just looked at the Intel manuals I have.  Not in the serializing
section, but in the atomic operations section, it says that those
will serialize things as well.  Have you considered using an XCHG
Reg,Mem or the like to serialize things?  Just curious.  It seems
that it should work better than a CPUID.

Regards,

Steve N.
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 17, 2013, 04:07:14 AM
Quote from: FORTRANS on May 17, 2013, 02:44:17 AM
Have you considered using an XCHG Reg,Mem or the like to serialize things?

Steve,
Thanks, indeed that was one of the first things I tested, but no real difference. Besides, cpuid seems to be the "official" way to serialise.
Title: Re: SwitchToThread vs Sleep(0)
Post by: FORTRANS on May 17, 2013, 04:13:27 AM
Hi,

   Okay, I had not noticed that you had looked at that.  And
yes, CPUID seems to be the code of choice in the samples I
have seen.

Thanks,

Steve N.
Title: Re: SwitchToThread vs Sleep(0)
Post by: Gunther on May 17, 2013, 04:29:57 AM
Hi Steve,

Quote from: FORTRANS on May 17, 2013, 04:13:27 AM
And yes, CPUID seems to be the code of choice in the samples I have seen.

yes, CPUID seems to be the best choice. Agner Fog recommends it, too.

Gunther
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 17, 2013, 05:43:59 AM
it would seem that CALL/RET does a fair job
C:\Masm32\Asm32 => dTime2
4 5 4 6 4 5 4 4 4 5
277 276 280 277 278 275 277 279 277 278
83 82 83 82 83 82 78 83 84 81
Press any key to continue ...

C:\Masm32\Asm32 => dTime2
3 3 4 3 3 3 6 4 4 3
275 279 280 277 277 276 279 275 281 275
82 83 83 83 83 84 82 84 83 81
Press any key to continue ...
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 17, 2013, 10:16:25 AM
        INVOKE  dTime,CodeToMeasure1,HIGH_PRIORITY_CLASS,THREAD_PRIORITY_ABOVE_NORMAL
;***********************************************************************************************

dTime   PROC USES EBX ESI EDI lpfnProc:LPVOID,dwPriClass:DWORD,dwPriLevel:DWORD

;Code Timing Function, David R. Sheldon - DednDave, Ver 1.1, May 2013

;--------------------------------------------------

;Call With: lpfnProc   = address of function to be timed
;           dwPriClass = process priority class
;           dwPriLevel = thread priority level
;
;  Returns: EAX        = clock cycles (not including function CALL/RET overhead)
;
;Also Uses: ECX, EDX, all other registers are preserved
;
;    Notes: 1) The function referenced by lpfnProc may not have any arguments.
;              It may, however, contain local variables. The time consumed creating and
;              destroying the stack frame will be included in the timing measurement.
;           2) The function referenced by lpfnProc may destroy any register contents,
;              but must balance the stack (ESP) before RETurn, of course.

;--------------------------------------------------

;Process priority class       Thread priority level     Base priority
;
;IDLE_PRIORITY_CLASS          THREAD_PRIORITY_IDLE            1
;                             THREAD_PRIORITY_LOWEST          2
;                             THREAD_PRIORITY_BELOW_NORMAL    3
;                             THREAD_PRIORITY_NORMAL          4
;                             THREAD_PRIORITY_ABOVE_NORMAL    5
;                             THREAD_PRIORITY_HIGHEST         6
;                             THREAD_PRIORITY_TIME_CRITICAL  15
;
;BELOW_NORMAL_PRIORITY_CLASS  THREAD_PRIORITY_IDLE            1
;                             THREAD_PRIORITY_LOWEST          4
;                             THREAD_PRIORITY_BELOW_NORMAL    5
;                             THREAD_PRIORITY_NORMAL          6
;                             THREAD_PRIORITY_ABOVE_NORMAL    7
;                             THREAD_PRIORITY_HIGHEST         8
;                             THREAD_PRIORITY_TIME_CRITICAL  15
;
;NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_IDLE            1
;                             THREAD_PRIORITY_LOWEST          6
;                             THREAD_PRIORITY_BELOW_NORMAL    7
;                             THREAD_PRIORITY_NORMAL          8
;                             THREAD_PRIORITY_ABOVE_NORMAL    9
;                             THREAD_PRIORITY_HIGHEST        10
;                             THREAD_PRIORITY_TIME_CRITICAL  15
;
;ABOVE_NORMAL_PRIORITY_CLASS  THREAD_PRIORITY_IDLE            1
;                             THREAD_PRIORITY_LOWEST          8
;                             THREAD_PRIORITY_BELOW_NORMAL    9
;                             THREAD_PRIORITY_NORMAL         10
;                             THREAD_PRIORITY_ABOVE_NORMAL   11
;                             THREAD_PRIORITY_HIGHEST        12
;                             THREAD_PRIORITY_TIME_CRITICAL  15
;
;HIGH_PRIORITY_CLASS          THREAD_PRIORITY_IDLE            1
;                             THREAD_PRIORITY_LOWEST         11
;                             THREAD_PRIORITY_BELOW_NORMAL   12
;                             THREAD_PRIORITY_NORMAL         13
;                             THREAD_PRIORITY_ABOVE_NORMAL   14
;                             THREAD_PRIORITY_HIGHEST        15
;                             THREAD_PRIORITY_TIME_CRITICAL  15
;
;REALTIME_PRIORITY_CLASS      THREAD_PRIORITY_IDLE           16
;                             THREAD_PRIORITY_LOWEST         22
;                             THREAD_PRIORITY_BELOW_NORMAL   23
;                             THREAD_PRIORITY_NORMAL         24
;                             THREAD_PRIORITY_ABOVE_NORMAL   25
;                             THREAD_PRIORITY_HIGHEST        26
;                             THREAD_PRIORITY_TIME_CRITICAL  31

;**************************************************

;local variables

;--------------------------------------------------

        LOCAL   _hProcess     :HANDLE
        LOCAL   _hThread      :HANDLE
        LOCAL   _dwPriClass   :DWORD
        LOCAL   _dwPriLevel   :DWORD
        LOCAL   _dwAffinity   :DWORD
        LOCAL   _dwTerminalHi :DWORD
        LOCAL   _dwTerminalLo :DWORD
        LOCAL   _dwTallyHi    :DWORD
        LOCAL   _dwTallyLo    :DWORD
        LOCAL   _dwPassCount  :DWORD

;**************************************************

;initialization

;--------------------------------------------------

        INVOKE  GetCurrentProcess
        mov     _hProcess,eax
        INVOKE  GetPriorityClass,eax
        mov     _dwPriClass,eax
        INVOKE  GetCurrentThread
        mov     _hThread,eax
        INVOKE  GetThreadPriority,eax
        mov     _dwPriLevel,eax
        INVOKE  GetProcessAffinityMask,_hProcess,addr _dwAffinity,addr _dwPassCount
        INVOKE  SetProcessAffinityMask,_hProcess,1
        rdtsc
        mov     esi,eax
        mov     edi,edx
        INVOKE  Sleep,125
        rdtsc
        xor     ecx,ecx
        mov     _dwTerminalLo,eax
        mov     _dwTerminalHi,edx
        mov     _dwPassCount,ecx
        sub     eax,esi
        sbb     edx,edi
        mov     _dwTallyLo,ecx
        shld    edx,eax,2
        shl     eax,2
        mov     _dwTallyHi,ecx
        add     _dwTerminalLo,eax
        adc     _dwTerminalHi,edx
        INVOKE  Sleep,ecx
        mov     edi,offset DummyProc
        call    SinglePass
        mov     edi,offset DummyProc
        call    SinglePass
        mov     edi,offset DummyProc
        call    SinglePass
        jmp short TopOfLoop

;**************************************************

;measurement single pass

;--------------------------------------------------

;EDI = proc address

        ALIGN   16

SinglePass:
        INVOKE  SetPriorityClass,_hProcess,dwPriClass
        INVOKE  Sleep,0               ;bind new priority
        INVOKE  SetThreadPriority,_hThread,dwPriLevel
        INVOKE  Sleep,0               ;bind new level
        INVOKE  Sleep,0               ;fresh slice
        rdtsc
        push    edx                   ;Ta
        push    eax
        push    ebp
        call    edi                   ;proc to be measured
        rdtsc
        pop     ebp
        push    edx                   ;Tb
        push    eax
        INVOKE  SetPriorityClass,_hProcess,_dwPriClass
        INVOKE  Sleep,0               ;bind new priority
        INVOKE  SetThreadPriority,_hThread,_dwPriLevel
        INVOKE  Sleep,0               ;bind new level
        pop     eax
        pop     edx
        pop     esi
        pop     edi
        mov     ecx,eax
        mov     ebx,edx               ;EBX:ECX = last tsc reading
        sub     eax,esi
        sbb     edx,edi               ;EDX:EAX = measured time
        retn

;**************************************************

;empty proc

;--------------------------------------------------

        ALIGN   16

DummyProc:
        retn

;**************************************************

;measurement loop

;--------------------------------------------------

TopOfLoop:
        mov     edi,offset DummyProc  ;empty proc for reference
        call    SinglePass
        push    edx
        push    eax
        mov     edi,lpfnProc          ;code to be measured
        call    SinglePass
        pop     esi
        pop     edi
        sub     eax,esi
        sbb     edx,edi
        inc dword ptr _dwPassCount
        add     _dwTallyLo,eax
        adc     _dwTallyHi,edx
        cmp     ebx,_dwTerminalHi
        jb      TopOfLoop

        ja      dTally

        cmp     ecx,_dwTerminalLo
        jb      TopOfLoop

;**************************************************

;tally results and exit

;--------------------------------------------------

dTally: INVOKE  SetProcessAffinityMask,_hProcess,_dwAffinity
        mov     edx,_dwTallyHi
        mov     eax,_dwTallyLo
        or      edx,edx
        mov     ecx,_dwPassCount
        jns     Tdivis

        xor     eax,eax
        jmp short dTExit

Tdivis: cmp     ecx,1
        jbe     dTExit

        div     ecx
        shl     edx,1
        cmp     edx,ecx
        sbb     eax,-1

dTExit: ret

dTime   ENDP

;***********************************************************************************************
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 17, 2013, 11:52:05 AM
Hi Dave :t
For dtime2:

44 36 30 28 4 7 24 35 0 28
356 468 415 386 405 354 384 431 384 376
48 58 77 144 76 84 103 101 77 79
Press any key to continue ...


Can you build and post the full code of your previous post?
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 17, 2013, 01:05:48 PM
it's attached in reply #57, Alex

here is version 1.0, which has the serialization code
but, version 1.1 is a little better, i think
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 17, 2013, 01:13:38 PM
Ah, then results in my previous message was for right thing - those results are for dTime2.zip, but I thought maybe I missed something.
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 17, 2013, 01:27:23 PM
Results for dTime1.zip:

4 34 0 0 10 7 7 20 33 13
27 32 51 82 21 8 22 24 24 26
65 67 79 104 104 92 74 135 36 74
Press any key to continue ...
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 17, 2013, 10:52:47 PM
thanks, Alex   :t

as i suspected, dTime2 is better
it still needs work, but i like certain features

1) the code to be timed is in a PROC, rather than using macros before and after
    and, the timing function is a PROC
2) the function allows adjustment of thread priority level
3) the loop count has been eliminated
    we know from experience that a run of ~0.5 seconds yields good results
    so, the dTime function runs until time has elapsed, rather than some pre-determined loop count
4) we try to eliminate the use of CPUID to serialize code
Title: Re: SwitchToThread vs Sleep(0)
Post by: FORTRANS on May 17, 2013, 10:53:18 PM
Hi Dave,

   Dtime2 results from the oldies.

Cheers,

Steve N.


P-III
4 3 2 3 5 0 0 0 3 2
99 97 95 95 95 96 102 98 93 100
117 114 110 120 122 118 120 121 117 121
Press any key to continue ...

P-MMX
4 3 4 4 4 5 4 4 4 4
19 19 60 9 19 20 21 19 19 19
137 130 126 130 130 133 130 130 130 130
Press any key to continue ...

Pentium M
3 3 3 0 5 3 4 3 3 0
118 118 126 135 113 124 119 118 118 118
32 34 34 35 34 34 34 34 34 34
Press any key to continue ...
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 17, 2013, 10:55:09 PM
thanks Steve
the results are fairly stable
it's cool to see the differences, as processors evolved   :P
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 17, 2013, 11:28:59 PM
Hi Dave,
You will love this one - AMD Athlon:
39 40 40 40 40 40 40 40 40 40
80 80 80 80 80 80 80 80 80 80
120 120 120 120 120 127 120 120 120 120
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 17, 2013, 11:43:19 PM
those are great numbers, Jochen   :biggrin:

still, some issues to work out
i will play with it more tomorrow

what do you think of the code ?
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 19, 2013, 08:34:45 PM
Quote from: dedndave on May 17, 2013, 11:43:19 PM
what do you think of the code ?

Looks ok, but seems not to like some other CPUs ;-)

In the meantime, I've worked out something based on Michael's idea. Here are some results for the timeslice length:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
6       valid tests, 116714     avg kCycles
7       valid tests, 116547     avg kCycles
6       valid tests, 116796     avg kCycles
7       valid tests, 116774     avg kCycles
7       valid tests, 116523     avg kCycles
5       valid tests, 116784     avg kCycles
7       valid tests, 116728     avg kCycles
6       valid tests, 116711     avg kCycles
6       valid tests, 116722     avg kCycles
8       valid tests, 116493     avg kCycles

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
9       valid tests, 49989      avg kCycles
10      valid tests, 49811      avg kCycles
9       valid tests, 50027      avg kCycles
9       valid tests, 49934      avg kCycles
9       valid tests, 49973      avg kCycles
9       valid tests, 50028      avg kCycles
10      valid tests, 49887      avg kCycles
9       valid tests, 49899      avg kCycles
9       valid tests, 50019      avg kCycles
10      valid tests, 49805      avg kCycles

Title: Re: SwitchToThread vs Sleep(0)
Post by: Gunther on May 19, 2013, 08:53:30 PM
Jochen,

my results:


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
10      valid tests, 158126     avg kCycles
8       valid tests, 158434     avg kCycles
8       valid tests, 158700     avg kCycles
9       valid tests, 158526     avg kCycles
5       valid tests, 158665     avg kCycles
10      valid tests, 137386     avg kCycles
8       valid tests, 158749     avg kCycles
8       valid tests, 158870     avg kCycles
9       valid tests, 158754     avg kCycles
8       valid tests, 158651     avg kCycles
ok


Gunther
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 19, 2013, 11:18:22 PM
that seems to look pretty good on my prescott   :biggrin:

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
9       valid tests, 93738      avg kCycles
9       valid tests, 93644      avg kCycles
9       valid tests, 93766      avg kCycles
10      valid tests, 93547      avg kCycles
9       valid tests, 93741      avg kCycles
9       valid tests, 93710      avg kCycles
9       valid tests, 93745      avg kCycles
9       valid tests, 93782      avg kCycles
9       valid tests, 93754      avg kCycles
9       valid tests, 93799      avg kCycles
Title: Re: SwitchToThread vs Sleep(0)
Post by: Antariy on May 20, 2013, 12:22:11 PM
Hi Jochen :t

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
9       valid tests, 65206      avg kCycles
9       valid tests, 66479      avg kCycles
9       valid tests, 66365      avg kCycles
9       valid tests, 66431      avg kCycles
10      valid tests, 66133      avg kCycles
10      valid tests, 66256      avg kCycles
9       valid tests, 66600      avg kCycles
9       valid tests, 66179      avg kCycles
10      valid tests, 65903      avg kCycles
9       valid tests, 66016      avg kCycles
ok
Title: Re: SwitchToThread vs Sleep(0)
Post by: Obivan on May 30, 2013, 02:16:18 AM
Hi Jochen,

here my results:
Intel(R) Xeon(R) CPU E31230 @ 3.20GHz (SSE4)
10      valid tests, 102589     avg kCycles
9       valid tests, 103229     avg kCycles
10      valid tests, 102377     avg kCycles
10      valid tests, 102594     avg kCycles
9       valid tests, 102947     avg kCycles
9       valid tests, 102224     avg kCycles
9       valid tests, 102925     avg kCycles
10      valid tests, 102337     avg kCycles
10      valid tests, 103004     avg kCycles
10      valid tests, 102270     avg kCycles
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 30, 2013, 02:22:59 AM
Thanks, Obivan, Alex, Dave and Gunther.

Results look pretty stable now, the next step would be to design a timer macro that starts at the beginning of the time slice and stops shortly before... if I find time ;-)
Title: Re: SwitchToThread vs Sleep(0)
Post by: dedndave on May 30, 2013, 03:37:13 AM
from what i am seeing, we don't violate that rule if we use 0.5 seconds as a loop count target
or, am i droping a decimal point someplace - lol
Title: Re: SwitchToThread vs Sleep(0)
Post by: qWord on May 30, 2013, 05:32:29 AM
Quote from: jj2007 on May 30, 2013, 02:22:59 AMResults look pretty stable now
Probably you have happily ignore the literature I've referenced, but your method is only stable as long as no one change the timer resolution. Also the values vary tremendously from the theoretical values (assuming no CPU throttling applies). The attached testbed shows the influence of the current timer resolution on the slice/quantum length. Remarks that you will only see a different, if no other application has already requested a timer resolution of 1ms (default values for the resolution are 10 or 15ms).
actual timer resolution : 0.01s , CPU frequency: 2294 MHz
clocks/quantum: 7.64667E+006
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
9       valid tests, 90789      avg kCycles
9       valid tests, 91772      avg kCycles
9       valid tests, 91738      avg kCycles
9       valid tests, 91776      avg kCycles
9       valid tests, 91796      avg kCycles
9       valid tests, 91827      avg kCycles
9       valid tests, 91797      avg kCycles
10      valid tests, 91540      avg kCycles
9       valid tests, 91812      avg kCycles
10      valid tests, 91622      avg kCycles

set timer resolution to 1ms
actual timer resolution : 0.001s , CPU frequency: 2294 MHz
clocks/quantum: 764667
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
9       valid tests, 73421      avg kCycles
9       valid tests, 73454      avg kCycles
9       valid tests, 73433      avg kCycles
9       valid tests, 73450      avg kCycles
9       valid tests, 73420      avg kCycles
9       valid tests, 73473      avg kCycles
9       valid tests, 73437      avg kCycles
10      valid tests, 73413      avg kCycles
10      valid tests, 73359      avg kCycles
10      valid tests, 73324      avg kCycles
ok


qWord
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 30, 2013, 07:12:43 AM
Quote from: qWord on May 30, 2013, 05:32:29 AM
Quote from: jj2007 on May 30, 2013, 02:22:59 AMResults look pretty stable now
Probably you have happily ignore the literature I've referenced

My apologies.

Quotebut your method is only stable as long as no one change the timer resolution

So I would suggest not to change the timer resolution in the middle of a timing exercise. The goal of this thread is to determine how many test loops can be squeezed into one time slice, in order to get a) high loop counts but b) no (or very few) context switches that could influence the precision of the cycle count.
Title: Re: SwitchToThread vs Sleep(0)
Post by: qWord on May 30, 2013, 07:24:38 AM
Quote from: jj2007 on May 30, 2013, 07:12:43 AMThe goal of this thread is to determine how many test loops can be squeezed into one time slice, in order to get a) high loop counts but b) no (or very few) context switches that could influence the precision of the cycle count.
Sorry that I've waste you time.
Title: Re: SwitchToThread vs Sleep(0)
Post by: jj2007 on May 30, 2013, 08:49:49 AM
Quote from: qWord on May 30, 2013, 07:24:38 AM
Sorry that I've waste you time.

In general, I read your posts carefully as you are a very knowledgeable person. I can learn from you.