The MASM Forum

General => The Campus => Topic started by: AKRichard on August 08, 2012, 09:06:45 AM

Title: Multithreaded apps in 64 bit assembler
Post by: AKRichard on August 08, 2012, 09:06:45 AM
Hello again all,

  I posted about memory management a few weeks ago.  I have a problem that cropped up since then.  After writing the memory management routines I have started getting errors when running the library in a multithreaded app (it used to run fine in multithreaded apps), and it is running perfect in a single thread.  The problem I am having is figuring out HOW to debug it.  It never errors out on the first run (usually only after a few thousand runs does it error) and when it does error, most of the time it trashes every object being used.  I had set up code in the algorithms trying to catch the errors before it crashes with stuff like

cmp          rax, 0
ja             ItsGood
mov          rax, rax

ItsGood:

mov           rcx, qword ptr[rax]      <--------will error here showing rax with 0


when looking for the returns to calls for memory (of course there is a breakpoint set at the mov     rax, rax).  At this point, most of the time if I try to stop debugging it will take 5 minutes or so to stop, or I have to use task manager to stop it.

Also, every once in a great while the debugger will stop and tell me something about an error there is no handler for(something like that and it doesnt tell WHAT the error is).

Anyhow, I am not finding much out there about debugging multithreaded assmbler programs.

I am using visual studio 2010 .net 4, mixed mode managed/native, the assembler is in an asm file all to itself.

Thanks
Title: Re: Multithreaded apps in 64 bit assembler
Post by: qWord on August 08, 2012, 09:13:51 AM
Aside your debugging problem: Did you consider about synchronization of shared resources? Not doing so, will automatically cause a unpredictable behavior.
Title: Re: Multithreaded apps in 64 bit assembler
Post by: AKRichard on August 08, 2012, 04:30:15 PM
As far as the managed side of the library, there are no shared resources, I have the library setup as immutable.  Whenever a change is made what your getting is a whole new object.   On the native side, there really is nothing to share, all the native side is there for is pass variables down to the assembley language routines and pass the return values back.  My test program consuming the library is not sharing any of the resources, each thread calls the same function, and inside the function everything the function uses is created locally so that each thread should have its own copy of everything.

  It was running fine in multithread mode before I changed the memory handling of the assembly routines, so I am pretty sure it is there.  I am just having a hard time figuring out how to debug it.  Since the errors dont happen right away, stepping into it wouldnt work (I could be here all night before I actually stepped into the error).  But I havent been able to trap the error like I was in single threaded mode, somehow, it bypasses the traps.  Finally, if I just wait for the error, I loose everything except the call stack, most of the registers read 0, all the local variables read 0, all the objects at the native and managed levels read undefined.

  If I am understanding correctly, the runtime dumps all the registers to the stack when changing context, but what about local variables within the assembly language routines?I do see how it could wreck things if it didnt save the local variables, though I wouldve thought it would have showed up long before now.  I was just wondering if Im not using the correct tools, do I need to configure it differently, would sacrificing a chicken help?
Title: Re: Multithreaded apps in 64 bit assembler
Post by: AKRichard on August 13, 2012, 03:34:00 PM
Doesnt anyone have a little advice?  I finally figured out a way to get my assembly code to print to a text file, though I am sure there is a more efficient way, how ever, Iam running into issues with writing to a text file in multithreaded mode now, please dont laugh too hard, it looks like I found about 30 more subjects I need to study in order to write algorithms in assembly language.

  I have a few more questions.

1.  Do I have all the syncronization mechanisms available in native c++?
2.  Are there other mechanisms I should be considering?
3.  I have an initialization section of code that should only be run once, but I have found out that if the runtime switches at an unfortunate moment (right after the first thread passes through the check to see if it has been initialized but before it sets the variable to signal that it has begun initialization) then I end with 2 threads attempting the initalization, how do you deal with that?
4.  Are there any good articles about coding assembly language for multithreaded apps (and debugging multithreaded apps)?  Ive found precious little on this subject especially on 64 bit platform.


  I answered the question about shared resources way to early, the memory management is a shared resource in the assembly code, I am sure that is what is screwing me up, but I still havent been able to catch it in action.

  I really dont mind working my way through the problem, but I am having a hard time with the lack of information I am finding on this subject.
Title: Re: Multithreaded apps in 64 bit assembler
Post by: qWord on August 14, 2012, 12:50:17 AM
Quote from: AKRichard on August 13, 2012, 03:34:00 PM1.  Do I have all the syncronization mechanisms available in native c++?
When writing Assembler you can use Window's Synchronization Functions (http://msdn.microsoft.com/en-us/library/windows/desktop/ms686353(v=vs.85).aspx) and/or the LOCK-Instruction-prefix that allow atomic memory access (see documentation).
Quote from: AKRichard on August 13, 2012, 03:34:00 PM3.  I have an initialization section of code that should only be run once, but I have found out that if the runtime switches at an unfortunate moment (right after the first thread passes through the check to see if it has been initialized but before it sets the variable to signal that it has begun initialization) then I end with 2 threads attempting the initalization, how do you deal with that?
The simplest solution is to use an initialization-function that is called from the main-thread at startup. This routine then creates the needed synchronizations objects.

Quote from: AKRichard on August 13, 2012, 03:34:00 PMI answered the question about shared resources way to early, the memory management is a shared resource in the assembly code, I am sure that is what is screwing me up, but I still havent been able to catch it in action.

  I really dont mind working my way through the problem, but I am having a hard time with the lack of information I am finding on this subject.
This problem is not trivial - you may consider about using the ready-to-use Heap Functions.
Title: Re: Multithreaded apps in 64 bit assembler
Post by: AKRichard on August 15, 2012, 08:02:48 AM
Quote from: qWord on August 14, 2012, 12:50:17 AM
When writing Assembler you can use Window's Synchronization Functions (http://msdn.microsoft.com/en-us/library/windows/desktop/ms686353(v=vs.85).aspx) and/or the LOCK-Instruction-prefix that allow atomic memory access (see documentation).

Thats what I thought, I am pretty sure I understand the process of creating the structures needed in assembly, but, how do I access metthods within a particular instance?  For example, semaphores, how would I call the Release method on a prticular instance?  I havent even started reading on these subjects yet, but even if I fail to find much on the web Ill probably be able to find the answers in masm32 somewhere

Quote from: qWord on August 14, 2012, 12:50:17 AM
The simplest solution is to use an initialization-function that is called from the main-thread at startup. This routine then creates the needed synchronizations objects.

That is what I figured out shortly after I posted the question, Now I have the assembly code initialized the very first time an instance of the managed class is instantiated.

Quote from: qWord on August 14, 2012, 12:50:17 AM
This problem is not trivial - you may consider about using the ready-to-use Heap Functions.

  Those are the functions I am using in my release version of code for using within my other programs, but, I like the response I am getting from these routines using the memory management Ive created, though everytime I identify another bug and fix it, it does loose some speed (its amazing how fast an incorrectly running algorithm can run lol).

  All of this brings up a whole slew of other subjects I need to study up on, which I will work on my own time instead of bugging the forums too much, but I do have a few quick questions.

  From what I am reading on the lock prefix on the amd processor, it sounds like it is specific to isolating access to resources between processors (not threads specifically), if that is correct, would these problems disappear on a single processor system?  I dont have a single processor system at my disposal at the moment, but am thinking about dropping an old motherboard into a case to try it out.

  While reading about the lock prefix in the amd docs, I came across the fence instructions, it mentions the fence can be used for ordering memory reads and writes, What I am confused of is atomicity, are these instructions (the fence instructions) used to control that a set of instructions get executed atomically?  I read that they are used for controling the order of memory accesses, but I dont think I understand it clearly.

Thanks again for the reply
Title: Re: Multithreaded apps in 64 bit assembler
Post by: AKRichard on August 15, 2012, 04:51:59 PM
Turns out the answer was quite a bit simpler than I thought it would be, your sugestion of using the lock prefix was great all it took was


mov rax, 0
mov rbx, 1

TryAgain:

cmpxchg         _gm, rbx
jnz TryAgain
sub rsp, 28h
call GoMem64
add rsp, 28h


to get the memory routine to run on only one thread/processor at a time and appears to be running fine now in multithreaded mode.  It did impact  some of my gains, but now that i understand a little better I can play with it to make it better.

  I would still be interested in knowing how yall debug multithreaded apps yourself.  I never could catch the problem myself while debugging.  I only managed to guess what the problem was.  I was able to change my code to make it stop crashing, but I never managed to trap the errors before they happened or catch the problem in any other way.

Anyways, thanks, you pointed straight to a solution.
Title: Re: Multithreaded apps in 64 bit assembler
Post by: dedndave on August 15, 2012, 09:33:31 PM
XCHG implies a LOCK, as well
i use it from time to time...
        mov     al,1
        xchg    al,bSemaphore       ;if semaphore = 1, query
        cmp     al,1
        jz      sleep_then_try_again

;otherwise, you own the semaphore
Title: Re: Multithreaded apps in 64 bit assembler
Post by: MichaelW on August 16, 2012, 01:17:26 AM
XCHG has an implicit lock only when a memory operand is involved.

Also, from  here (http://technet.microsoft.com/en-us/library/ms684122(VS.85).aspx):

QuoteSimple reads and writes to properly-aligned 32-bit variables are atomic operations. In other words, you will not end up with only one portion of the variable updated; all bits are updated in an atomic fashion. However, access is not guaranteed to be synchronized. If two threads are reading and writing from the same variable, you cannot determine if one thread will perform its read operation before the other performs its write operation.
Simple reads and writes to properly aligned 64-bit variables are atomic on 64-bit Windows. Reads and writes to 64-bit values are not guaranteed to be atomic on 32-bit Windows. Reads and writes to variables of other sizes are not guaranteed to be atomic on any platform.


Title: Re: Multithreaded apps in 64 bit assembler
Post by: AKRichard on August 16, 2012, 07:20:07 AM
in the amd docs:

Quote
The LOCK prefix can only be used with forms of the following instructions that write a memory
operand: ADC, ADD, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, CMPXCHG16B, DEC,
INC, NEG, NOT, OR, SBB, SUB, XADD, XCHG, and XOR. An invalid-opcode exception occurs if
the LOCK prefix is used with any other instruction.

dedndave, how do you release the semaphore? originally I was thinking of using a semaphore or something, but I couldnt find an example of how to release it.  I think the solution I came up with was rather simple, using the _gm variable to signal when another thread was in the memory management routine and resetting the variable right before the RET statement (only the one lock prefix used),  Ive allready run through some 15 million iterations through the algorithms and it hasnt errored yet, its almost disappointing that it could be that simple but took me a week to get it here. 
Title: Re: Multithreaded apps in 64 bit assembler
Post by: dedndave on August 16, 2012, 08:31:26 AM
with XCHG reg,mem, you get a bus lock
that means only one thread may access that memory at a given time

when you put a 1 there, it tells other threads that it is being queried
the way i use it - the other bits can mean other things
if you XCHG and get a 1 back - you do not have access
if you XCHG and get back a value other than 1, it is the semaphore and you own it
to release it, you put the original byte back
other threads will wait for that bit to be low before accessing it and whatever it locks

if i am waiting for the semaphore, i use the Sleep function to allow other threads/processes to run
it isn't critical, but i use something like 1 to 10 mS
it depends on how fast you want things to process
Title: Re: Multithreaded apps in 64 bit assembler
Post by: qWord on August 16, 2012, 09:16:43 AM
The problem is that you can't efficiently build a signaling mechanism between threads using the LOCK prefix. Your code enters a  loop for waiting and that force the OS to assign more execution time for that thread. The thread that currently 'owns' the mutex is disadvantaged thus it took longer to release it.
IMO there is no way around using the corresponding OS functions, if waiting is required. However, a loop that executes some times before calling a blocking API function maybe an interesting option (I wouldn't be surprised if this is even done by some of the APIs).
Title: Re: Multithreaded apps in 64 bit assembler
Post by: hool on August 17, 2012, 12:42:15 AM
Quote from: AKRichard on August 15, 2012, 04:51:59 PM... some code ... to get the memory routine to run on only one thread/processor at a time and appears to be running fine now in multithreaded mode.
I'm afraid you got lucky. Cmpxchg is very fast comparing to switching threads for example, and it would probably take long time before your software starts misbehaving. Do use Lock prefix.

Quote from: dedndaveXCHG implies a LOCK, as well
Ouch, by "as well" you meant cmpxchg?
Title: Re: Multithreaded apps in 64 bit assembler
Post by: dedndave on August 17, 2012, 12:51:33 AM
"as well", as in "also"   :P

when you use XCHG reg,mem, a LOCK prefix is implied
i.e., you do not have to explicitly use LOCK

the reasons i use this method - you control the Sleep time
if you want faster semaphore tests, use 1 mS
if you want the thread that owns the semaphore to have more time, use 10 mS or more

and - you don't have to worry about OS version - lol

i have played with mutexes and semaphores via the API functions, as well
the XCHG method takes a little more code, but it is simple code
Title: Re: Multithreaded apps in 64 bit assembler
Post by: hool on August 17, 2012, 12:55:17 AM
how cool  :lol:
cmpxchg implies a lock  :dazzled:

I am just following the discussion post by post from above
Title: Re: Multithreaded apps in 64 bit assembler
Post by: dedndave on August 17, 2012, 01:33:42 AM
i don't know if CMPXCHG implies lock or not
i am sure it is mentioned in the intel docs if it does
i prefer the simple XCHG
Title: Re: Multithreaded apps in 64 bit assembler
Post by: MichaelW on August 17, 2012, 02:36:06 AM
Instead of a Sleep 1, which unless you change the minimum timer resolution will suspend the current thread for the same interval as a Sleep 10, or on recent systems a Sleep 15 would, why not use Sleep 0?
Title: Re: Multithreaded apps in 64 bit assembler
Post by: dedndave on August 19, 2012, 08:33:43 AM
i think we've been over that before, Michael - lol (a few times)
i don't believe the system timer resolution affects Sleep
Title: Re: Multithreaded apps in 64 bit assembler
Post by: jj2007 on August 19, 2012, 09:21:37 AM
Sleep 0 is definitely a lot faster:

Sleep 1: 1562662 µs
Sleep 0: 2696 µs

Sleep 1: 1562470 µs
Sleep 0: 1225 µs

Sleep 1: 1562516 µs
Sleep 0: 769 µs

include \masm32\MasmBasic\MasmBasic.inc   ; download (http://masm32.com/board/index.php?topic=94.0)
  Init
  REPEAT 3
   NanoTimer()
   mov ebx, 100
   .Repeat
      invoke Sleep, 1
      dec ebx
   .Until Zero?
   Print Str$("Sleep 1: %i µs\n", NanoTimer(µs))
   NanoTimer()
   mov ebx, 100
   .Repeat
      invoke Sleep, 0
      dec ebx
   .Until Zero?
   Print Str$("Sleep 0: %i µs\n\n", NanoTimer(µs))
  ENDM
  Inkey
  Exit
end start
Title: Re: Multithreaded apps in 64 bit assembler
Post by: dedndave on August 19, 2012, 09:45:14 AM
well - faster, how ?
if a thread uses Sleep to allow other threads to run...
then using a smaller elapse time means the thread is consuming more time checking the semaphore   :P

it is a trade-off - you must decide what is appropriate for each individual case
fast response vs use of fewer cycles
Title: Re: Multithreaded apps in 64 bit assembler
Post by: sinsi on August 19, 2012, 10:26:01 AM
Quote from: dedndave on August 19, 2012, 08:33:43 AM
i think we've been over that before, Michael - lol (a few times)
i don't believe the system timer resolution affects Sleep
timeBeginPeriod affects the system timer for all processes.

Using "invoke timeBeginPeriod,1" in jj's code gives me:
Sleep 1: 100010 µs
Sleep 0: 15 µs

Sleep 1: 100011 µs
Sleep 0: 15 µs

Sleep 1: 100005 µs
Sleep 0: 15 µs

http://masm32.com/board/index.php?topic=322.0
Title: Re: Multithreaded apps in 64 bit assembler
Post by: MichaelW on August 19, 2012, 11:00:27 AM
Quote from: dedndave on August 19, 2012, 08:33:43 AM
i think we've been over that before, Michael - lol (a few times)
i don't believe the system timer resolution affects Sleep
Yep, it's like déjà vu all over again :icon_eek:

Instead of depending on my memory of the performance counter not being affected by the minimum timer resolution, I coded a (not very accurate) timer that uses TSC as a time reference.

;==============================================================================
    include \masm32\include\masm32rt.inc
    include \masm32\include\winmm.inc
    includelib \masm32\lib\winmm.lib
    .686
;==============================================================================
    .data
        clkhz dq    0
        r8    REAL8 ?
    .code
;==============================================================================
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
TscTimer proc
    .IF DWORD PTR clkhz == 0
        invoke Sleep, 3000
        rdtsc
        push edx
        push eax
        invoke Sleep, 1000
        rdtsc
        pop ecx
        sub eax, ecx
        pop ecx
        sbb edx, ecx
        mov DWORD PTR clkhz, eax
        mov DWORD PTR clkhz+4, edx
        printf("%I64dHz\n\n", clkhz)
    .ENDIF
    rdtsc
    push edx
    push eax
    fild QWORD PTR [esp]
    add esp, 8
    fild clkhz
    fdiv
    ret
TscTimer endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;==============================================================================
start:
;==============================================================================

    invoke TscTimer
    fstp r8
    invoke TscTimer
    fstp r8
    invoke Sleep, 1000
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n\n", r8)

    invoke TscTimer
    fstp r8
    REPEAT 1000
        invoke Sleep, 1
    ENDM
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n", r8)

    invoke timeBeginPeriod, 1

    invoke TscTimer
    fstp r8
    REPEAT 1000
        invoke Sleep, 1
    ENDM
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n\n", r8)

    invoke timeEndPeriod, 1

    inkey
    exit
;==============================================================================
END start


Running on my 500MHz P3 Windows 2000 system:

504248079Hz

0.999615s

9.999756s
1.011535s

Title: Re: Multithreaded apps in 64 bit assembler
Post by: dedndave on August 19, 2012, 12:39:56 PM
Prescott w/htt, XP MCE2005 SP3
3000103515Hz

0.999860s

1.955947s
1.974502s


although, i am not really sure how to interpret the results, Michael - lol

i have attached an assembled copy for others to try...
Title: Re: Multithreaded apps in 64 bit assembler
Post by: MichaelW on August 19, 2012, 02:22:57 PM
P4 Northwood, Windows XP SP3:

2992338536Hz

0.986793s

15.625224s
1.965253s


The point was to demonstrate that the Sleep period does depend on the minimum timer resolution. Perhaps on XP MCE it doesn't, or the minimum timer resolution is stuck at the 1.
Title: Re: Multithreaded apps in 64 bit assembler
Post by: MichaelW on August 19, 2012, 02:32:16 PM
I changed the source so it reports the minimum and maximum timer resolutions:

;==============================================================================
    include \masm32\include\masm32rt.inc
    include \masm32\include\winmm.inc
    includelib \masm32\lib\winmm.lib
    .686
;==============================================================================
    .data
        clkhz dq    0
        r8    REAL8 ?
        tc    TIMECAPS <>
    .code
;==============================================================================
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
TscTimer proc
    .IF DWORD PTR clkhz == 0
        invoke Sleep, 3000
        rdtsc
        push edx
        push eax
        invoke Sleep, 1000
        rdtsc
        pop ecx
        sub eax, ecx
        pop ecx
        sbb edx, ecx
        mov DWORD PTR clkhz, eax
        mov DWORD PTR clkhz+4, edx
        printf("%I64dHz\n\n", clkhz)
    .ENDIF
    rdtsc
    push edx
    push eax
    fild QWORD PTR [esp]
    add esp, 8
    fild clkhz
    fdiv
    ret
TscTimer endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;==============================================================================
start:
;==============================================================================
    invoke timeGetDevCaps, ADDR tc, SIZEOF tc
    printf("min %d\tmax %d\n\n", tc.wPeriodMin, tc.wPeriodMax)

    invoke TscTimer
    fstp r8
    invoke TscTimer
    fstp r8
    invoke Sleep, 1000
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n\n", r8)

    invoke TscTimer
    fstp r8
    REPEAT 1000
        invoke Sleep, 1
    ENDM
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n", r8)

    invoke timeBeginPeriod, 1

    invoke TscTimer
    fstp r8
    REPEAT 1000
        invoke Sleep, 1
    ENDM
    invoke TscTimer
    fld r8
    fsub
    fstp r8
    printf("%fs\n\n", r8)

    invoke timeEndPeriod, 1

    inkey
    exit
;==============================================================================
END start


And on both systems I get:

min 1    max 1000000
Title: Re: Multithreaded apps in 64 bit assembler
Post by: jj2007 on August 19, 2012, 08:31:59 PM
The question is really "what does a sleep 0 effectively do?". My code calls it 100 times inside a loop, and the response times show clearly that it does not wait a lot. But does it wait at all?

Measuring with rdtsc, on my Celeron it seems that a Sleep 0 is roughly 1600 cycles:
1593372 cycles for 1000*Sleep 0: 1061 µs
So it does hang around somewhere and wait for something, e.g. a new timeslice...
Title: Re: Multithreaded apps in 64 bit assembler
Post by: hutch-- on August 19, 2012, 10:36:52 PM
JJ,

The spec on SleepEx() is if you set the delay to zero it will immediately return if there is no other thread to run. Set it to SleepEx,1,0 and you will force a yield even if there is no other thread on that core that is waiting.
Title: Re: Multithreaded apps in 64 bit assembler
Post by: MichaelW on August 19, 2012, 11:26:55 PM
Quote from: dedndave on August 19, 2012, 09:45:14 AM
if a thread uses Sleep to allow other threads to run...
then using a smaller elapse time means the thread is consuming more time checking the semaphore   :P

Regarding the use of Sleep 0, if there is no other thread ready to run then what is the problem with consuming time checking the semaphore? And if there is some other thread ready to run, then it will receive whatever is left of the current timeslice, which I think would typically be most of the timeslice.
Title: Re: Multithreaded apps in 64 bit assembler
Post by: dedndave on August 20, 2012, 01:39:36 AM
the assumption there is that if some other thread owns the semaphore, it must be doing something   :P
you want to allow it to finish it's work so it will release the semaphore
Title: Re: Multithreaded apps in 64 bit assembler
Post by: jj2007 on August 20, 2012, 01:52:25 AM
MSDN (http://msdn.microsoft.com/en-us/library/windows/desktop/ms686307%28v=vs.85%29.aspx): If you specify 0 milliseconds, the thread will relinquish the remainder of its time slice but remain ready. Note that a ready thread is not guaranteed to run immediately

My interpretation is that
- there is always another thread ready to run;
- the next action of the current thread happens in precisely the moment when it's been given a crispy fresh timeslice, which might not be "immediately".
Title: Re: Multithreaded apps in 64 bit assembler
Post by: dedndave on August 20, 2012, 03:28:54 AM
that is pretty much how i interpret it, too
although - the operating system supposedly gives a slightly higher priority to the process of the foreground window
that changes the overall picture a little
in all of our testing, i think we have had the foreground window
i don't think i've seen a case where you don't get the next time slice
Title: Re: Multithreaded apps in 64 bit assembler
Post by: AKRichard on August 20, 2012, 05:47:33 PM
I have not figured out how to implement the sleep function in 64 bit, Ive been using the masm32 kind of as a reference to see how things are done when I cannot find a direct example of what Im trying to do.

  I was wondering though, how hard is it to elevate a prticular threads priority level while it is in the memory routines?  From what I am reading on msdn, it would seem raising the scheduling priority would make the sleep function a moot point. from the msdn site:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx

Quote
The system treats all threads with the same priority as equal. The system assigns time slices in a round-robin fashion to all threads with the highest priority. If none of these threads are ready to run, the system assigns time slices in a round-robin fashion to all threads with the next highest priority. If a higher-priority thread becomes available to run, the system ceases to execute the lower-priority thread (without allowing it to finish using its time slice), and assigns a full time slice to the higher-priority thread. For more information, see Context Switches.

if Im understanding that correctly, upping the priority would gaurantee that that particular thread would run to completion through the memory routine without interruption on a single core processor.  I dont quite have it running as well as I would like yet (the memory management routines) but after I do, I am going to work on this aspect of it, whether it be with the sleep function, changing priority levels, or a combination there of reading through all the comments is worth its weight in gold because I am finding answers to questions that I would have likely had once I started.

Quote
from hool:
I'm afraid you got lucky. Cmpxchg is very fast comparing to switching threads for example, and it would probably take long time before your software starts misbehaving. Do use Lock prefix.

ya, I actually caught that shortly after my last message on this thread.  I had thought it used the lock prefix implicitly.  I have explicitly used the lock prefix in it now, I am not sure yet, but I think there is something wrong with how I implemented it, I am not getting the errors anymore, but every once in a great while an algorithm will return an incorrect result in multithreaded mode that Im not getting in single threaded mode, and if I feed the same values back into the algorithm it allways gives the correct result on the second run.  Im not sure if its because multiple threads are getting through on the memory routine at the same time or if I have a bug that is giving memory allready assigned to multiple threads, Im pretty sure its one or the other though.

Quote
from dedndave:
"as well", as in "also"   

when you use XCHG reg,mem, a LOCK prefix is implied
i.e., you do not have to explicitly use LOCK

  I dont know about the intel docs but in the amd docs:

Quote
Exchanges the contents of the two operands. The operands can be two general-purpose registers or a
register and a memory location. If either operand references memory, the processor locks
automatically, whether or not the LOCK prefix is used and independently of the value of IOPL. For
details about the LOCK prefix, see "Lock Prefix" on page 8.

It was this that lead me to believe it was implied on the cmpxchg (I am actually using cmpxchg8b now). I was wondering though dedndave, in your post, you used xchg (I figured you were using a cmp instruction not listed in your code) wouldnt that leave the possibility of 2 threads entering the function then?  If one thread were to have compared right after another thread had done its compare but before it executed the xchg instruction then youd have 2 threads in the same section of code you were trying to protect, or am I missing something?

  I wanted to make sure, local variables in a function are not shared are they? on a multiprocessor computer, if 2 processors happen to be in the same function at the same time, they both get their own copy of all locals so that when one processor modifies a variable, the other processor will not see that modification correct?
Title: Re: Multithreaded apps in 64 bit assembler
Post by: sinsi on August 20, 2012, 09:34:29 PM
Have you looked at critical sections?
Title: Re: Multithreaded apps in 64 bit assembler
Post by: dedndave on August 21, 2012, 01:19:12 AM
http://masm32.com/board/index.php?topic=552.msg4682#msg4682 (http://masm32.com/board/index.php?topic=552.msg4682#msg4682)

there is a CMP, there
however, you already hold the semaphore
the one in memory is replaced with a 1
if another thread tries to access it, it will replace the 1 with a 1 via XCHG
when it examines it, it finds the semaphore is already owned by another thread

i don't use CMPXCHG, because it is not supported on older processors
and - i am not sure if it implies a LOCK, as XCHG does
Title: Re: Multithreaded apps in 64 bit assembler
Post by: AKRichard on August 21, 2012, 05:07:07 PM
Quote from: sinsi on August 20, 2012, 09:34:29 PM
Have you looked at critical sections?

  I had used critical sections when I was using inline assembly in a 32 bit build, but when I decided to try to move it to 64 bit, I figured Id keep the assembly language to a minimum (and hopefully keep it simple) and just use it for the math and didnt think Id need synchronization.  Well the simple (not so simple for me at least) has been going out the door as I keep pushing for faster algorithms.

Quote
from dedndave:

i don't use CMPXCHG, because it is not supported on older processors
and - i am not sure if it implies a LOCK, as XCHG does

I found out the hard way it doesnt implicitly lock it, I also finally realized my bug was in that lock, on the cmpxchg8b instruction, if the compare fails (values dont match) it moves the value from memory into the register it was comparing against, therefore on the second run through, the values DID match, would do the xchg, and let the thread through.  Once I moved the initialization code inside of the loop, the problem went away.  I read that part of the amd doc 50 times and didnt realize that as a problem until I figured out how to trap it doing it.

  Good news though, I  have it running more stable then microsofts BigInteger (which errors out with memory errors before my algs do now), and as for response times, my addition and subtraction routines are running just a hair slower then microsofts, but my multiplication and division algs are running a lot faster then theirs, my exponentiation and modular exponentiation algs are looking pretty close to twice as fast for numbers with more then 50 elements in the array and exponents bewteen 100 and 500.  My testing program isnt highly accurate, Im just using the system clock, but it consistently shows those results.  I got to thank you guys, this library has been slowly evolving over the years and was never meant to become this large, but you guys got me past that block that kept me from even considering pushing the project further, Im even going to go back and clean up the code and put it out there for some thoughts on where I should have done things different and how it can improve.  Thanks!