Author Topic: Multithreaded apps in 64 bit assembler  (Read 9990 times)

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: Multithreaded apps in 64 bit assembler
« Reply #15 on: August 17, 2012, 01:33:42 AM »
i don't know if CMPXCHG implies lock or not
i am sure it is mentioned in the intel docs if it does
i prefer the simple XCHG

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1209
Re: Multithreaded apps in 64 bit assembler
« Reply #16 on: August 17, 2012, 02:36:06 AM »
Instead of a Sleep 1, which unless you change the minimum timer resolution will suspend the current thread for the same interval as a Sleep 10, or on recent systems a Sleep 15 would, why not use Sleep 0?
Well Microsoft, here’s another nice mess you’ve gotten us into.

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: Multithreaded apps in 64 bit assembler
« Reply #17 on: August 19, 2012, 08:33:43 AM »
i think we've been over that before, Michael - lol (a few times)
i don't believe the system timer resolution affects Sleep

jj2007

  • Member
  • *****
  • Posts: 7557
  • Assembler is fun ;-)
    • MasmBasic
Re: Multithreaded apps in 64 bit assembler
« Reply #18 on: August 19, 2012, 09:21:37 AM »
Sleep 0 is definitely a lot faster:

Sleep 1: 1562662 µs
Sleep 0: 2696 µs

Sleep 1: 1562470 µs
Sleep 0: 1225 µs

Sleep 1: 1562516 µs
Sleep 0: 769 µs

include \masm32\MasmBasic\MasmBasic.inc   ; download
  Init
  REPEAT 3
   NanoTimer()
   mov ebx, 100
   .Repeat
      invoke Sleep, 1
      dec ebx
   .Until Zero?
   Print Str$("Sleep 1: %i µs\n", NanoTimer(µs))
   NanoTimer()
   mov ebx, 100
   .Repeat
      invoke Sleep, 0
      dec ebx
   .Until Zero?
   Print Str$("Sleep 0: %i µs\n\n", NanoTimer(µs))
  ENDM
  Inkey
  Exit
end start

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: Multithreaded apps in 64 bit assembler
« Reply #19 on: August 19, 2012, 09:45:14 AM »
well - faster, how ?
if a thread uses Sleep to allow other threads to run...
then using a smaller elapse time means the thread is consuming more time checking the semaphore   :P

it is a trade-off - you must decide what is appropriate for each individual case
fast response vs use of fewer cycles

sinsi

  • Member
  • ****
  • Posts: 996
Re: Multithreaded apps in 64 bit assembler
« Reply #20 on: August 19, 2012, 10:26:01 AM »
i think we've been over that before, Michael - lol (a few times)
i don't believe the system timer resolution affects Sleep
timeBeginPeriod affects the system timer for all processes.

Using "invoke timeBeginPeriod,1" in jj's code gives me:
Sleep 1: 100010 µs
Sleep 0: 15 µs

Sleep 1: 100011 µs
Sleep 0: 15 µs

Sleep 1: 100005 µs
Sleep 0: 15 µs

http://masm32.com/board/index.php?topic=322.0
I can walk on water but stagger on beer.

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1209
Re: Multithreaded apps in 64 bit assembler
« Reply #21 on: August 19, 2012, 11:00:27 AM »
i think we've been over that before, Michael - lol (a few times)
i don't believe the system timer resolution affects Sleep
Yep, it’s like déjà vu all over again :icon_eek:

Instead of depending on my memory of the performance counter not being affected by the minimum timer resolution, I coded a (not very accurate) timer that uses TSC as a time reference.
Code: [Select]
;==============================================================================
    include \masm32\include\masm32rt.inc
    include \masm32\include\winmm.inc
    includelib \masm32\lib\winmm.lib
    .686
;==============================================================================
    .data
        clkhz dq    0
        r8    REAL8 ?
    .code
;==============================================================================
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
TscTimer proc
    .IF DWORD PTR clkhz == 0
        invoke Sleep, 3000
        rdtsc
        push edx
        push eax
        invoke Sleep, 1000
        rdtsc
        pop ecx
        sub eax, ecx
        pop ecx
        sbb edx, ecx
        mov DWORD PTR clkhz, eax
        mov DWORD PTR clkhz+4, edx
        printf("%I64dHz\n\n", clkhz)
    .ENDIF
    rdtsc
    push edx
    push eax
    fild QWORD PTR [esp]
    add esp, 8
    fild clkhz
    fdiv
    ret
TscTimer endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;==============================================================================
start:
;==============================================================================

    invoke TscTimer
    fstp r8
    invoke TscTimer
    fstp r8
    invoke Sleep, 1000
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n\n", r8)

    invoke TscTimer
    fstp r8
    REPEAT 1000
        invoke Sleep, 1
    ENDM
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n", r8)

    invoke timeBeginPeriod, 1

    invoke TscTimer
    fstp r8
    REPEAT 1000
        invoke Sleep, 1
    ENDM
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n\n", r8)

    invoke timeEndPeriod, 1

    inkey
    exit
;==============================================================================
END start

Running on my 500MHz P3 Windows 2000 system:
Code: [Select]
504248079Hz

0.999615s

9.999756s
1.011535s
Well Microsoft, here’s another nice mess you’ve gotten us into.

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: Multithreaded apps in 64 bit assembler
« Reply #22 on: August 19, 2012, 12:39:56 PM »
Prescott w/htt, XP MCE2005 SP3
Code: [Select]
3000103515Hz

0.999860s

1.955947s
1.974502s

although, i am not really sure how to interpret the results, Michael - lol

i have attached an assembled copy for others to try...

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1209
Re: Multithreaded apps in 64 bit assembler
« Reply #23 on: August 19, 2012, 02:22:57 PM »
P4 Northwood, Windows XP SP3:
Code: [Select]
2992338536Hz

0.986793s

15.625224s
1.965253s

The point was to demonstrate that the Sleep period does depend on the minimum timer resolution. Perhaps on XP MCE it doesn't, or the minimum timer resolution is stuck at the 1.
Well Microsoft, here’s another nice mess you’ve gotten us into.

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1209
Re: Multithreaded apps in 64 bit assembler
« Reply #24 on: August 19, 2012, 02:32:16 PM »
I changed the source so it reports the minimum and maximum timer resolutions:
Code: [Select]
;==============================================================================
    include \masm32\include\masm32rt.inc
    include \masm32\include\winmm.inc
    includelib \masm32\lib\winmm.lib
    .686
;==============================================================================
    .data
        clkhz dq    0
        r8    REAL8 ?
        tc    TIMECAPS <>
    .code
;==============================================================================
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
TscTimer proc
    .IF DWORD PTR clkhz == 0
        invoke Sleep, 3000
        rdtsc
        push edx
        push eax
        invoke Sleep, 1000
        rdtsc
        pop ecx
        sub eax, ecx
        pop ecx
        sbb edx, ecx
        mov DWORD PTR clkhz, eax
        mov DWORD PTR clkhz+4, edx
        printf("%I64dHz\n\n", clkhz)
    .ENDIF
    rdtsc
    push edx
    push eax
    fild QWORD PTR [esp]
    add esp, 8
    fild clkhz
    fdiv
    ret
TscTimer endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;==============================================================================
start:
;==============================================================================
    invoke timeGetDevCaps, ADDR tc, SIZEOF tc
    printf("min %d\tmax %d\n\n", tc.wPeriodMin, tc.wPeriodMax)

    invoke TscTimer
    fstp r8
    invoke TscTimer
    fstp r8
    invoke Sleep, 1000
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n\n", r8)

    invoke TscTimer
    fstp r8
    REPEAT 1000
        invoke Sleep, 1
    ENDM
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n", r8)

    invoke timeBeginPeriod, 1

    invoke TscTimer
    fstp r8
    REPEAT 1000
        invoke Sleep, 1
    ENDM
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n\n", r8)

    invoke timeEndPeriod, 1

    inkey
    exit
;==============================================================================
END start

And on both systems I get:

min 1    max 1000000
Well Microsoft, here’s another nice mess you’ve gotten us into.

jj2007

  • Member
  • *****
  • Posts: 7557
  • Assembler is fun ;-)
    • MasmBasic
Re: Multithreaded apps in 64 bit assembler
« Reply #25 on: August 19, 2012, 08:31:59 PM »
The question is really "what does a sleep 0 effectively do?". My code calls it 100 times inside a loop, and the response times show clearly that it does not wait a lot. But does it wait at all?

Measuring with rdtsc, on my Celeron it seems that a Sleep 0 is roughly 1600 cycles:
1593372 cycles for 1000*Sleep 0: 1061 µs
So it does hang around somewhere and wait for something, e.g. a new timeslice...

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4813
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Multithreaded apps in 64 bit assembler
« Reply #26 on: August 19, 2012, 10:36:52 PM »
JJ,

The spec on SleepEx() is if you set the delay to zero it will immediately return if there is no other thread to run. Set it to SleepEx,1,0 and you will force a yield even if there is no other thread on that core that is waiting.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1209
Re: Multithreaded apps in 64 bit assembler
« Reply #27 on: August 19, 2012, 11:26:55 PM »
if a thread uses Sleep to allow other threads to run...
then using a smaller elapse time means the thread is consuming more time checking the semaphore   :P

Regarding the use of Sleep 0, if there is no other thread ready to run then what is the problem with consuming time checking the semaphore? And if there is some other thread ready to run, then it will receive whatever is left of the current timeslice, which I think would typically be most of the timeslice.
Well Microsoft, here’s another nice mess you’ve gotten us into.

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: Multithreaded apps in 64 bit assembler
« Reply #28 on: August 20, 2012, 01:39:36 AM »
the assumption there is that if some other thread owns the semaphore, it must be doing something   :P
you want to allow it to finish it's work so it will release the semaphore
« Last Edit: August 20, 2012, 03:24:37 AM by dedndave »

jj2007

  • Member
  • *****
  • Posts: 7557
  • Assembler is fun ;-)
    • MasmBasic
Re: Multithreaded apps in 64 bit assembler
« Reply #29 on: August 20, 2012, 01:52:25 AM »
MSDN: If you specify 0 milliseconds, the thread will relinquish the remainder of its time slice but remain ready. Note that a ready thread is not guaranteed to run immediately

My interpretation is that
- there is always another thread ready to run;
- the next action of the current thread happens in precisely the moment when it's been given a crispy fresh timeslice, which might not be "immediately".