Multithreaded apps in 64 bit assembler

dedndave · August 20, 2012, 03:28:54 AM

that is pretty much how i interpret it, too
although - the operating system supposedly gives a slightly higher priority to the process of the foreground window
that changes the overall picture a little
in all of our testing, i think we have had the foreground window
i don't think i've seen a case where you don't get the next time slice

AKRichard · August 20, 2012, 05:47:33 PM

I have not figured out how to implement the sleep function in 64 bit, Ive been using the masm32 kind of as a reference to see how things are done when I cannot find a direct example of what Im trying to do.

I was wondering though, how hard is it to elevate a prticular threads priority level while it is in the memory routines? From what I am reading on msdn, it would seem raising the scheduling priority would make the sleep function a moot point. from the msdn site:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx

Quote
The system treats all threads with the same priority as equal. The system assigns time slices in a round-robin fashion to all threads with the highest priority. If none of these threads are ready to run, the system assigns time slices in a round-robin fashion to all threads with the next highest priority. If a higher-priority thread becomes available to run, the system ceases to execute the lower-priority thread (without allowing it to finish using its time slice), and assigns a full time slice to the higher-priority thread. For more information, see Context Switches.

if Im understanding that correctly, upping the priority would gaurantee that that particular thread would run to completion through the memory routine without interruption on a single core processor. I dont quite have it running as well as I would like yet (the memory management routines) but after I do, I am going to work on this aspect of it, whether it be with the sleep function, changing priority levels, or a combination there of reading through all the comments is worth its weight in gold because I am finding answers to questions that I would have likely had once I started.

Quote
from hool:
I'm afraid you got lucky. Cmpxchg is very fast comparing to switching threads for example, and it would probably take long time before your software starts misbehaving. Do use Lock prefix.

ya, I actually caught that shortly after my last message on this thread. I had thought it used the lock prefix implicitly. I have explicitly used the lock prefix in it now, I am not sure yet, but I think there is something wrong with how I implemented it, I am not getting the errors anymore, but every once in a great while an algorithm will return an incorrect result in multithreaded mode that Im not getting in single threaded mode, and if I feed the same values back into the algorithm it allways gives the correct result on the second run. Im not sure if its because multiple threads are getting through on the memory routine at the same time or if I have a bug that is giving memory allready assigned to multiple threads, Im pretty sure its one or the other though.

Quote
from dedndave:
"as well", as in "also"

when you use XCHG reg,mem, a LOCK prefix is implied
i.e., you do not have to explicitly use LOCK

I dont know about the intel docs but in the amd docs:

Quote
Exchanges the contents of the two operands. The operands can be two general-purpose registers or a
register and a memory location. If either operand references memory, the processor locks
automatically, whether or not the LOCK prefix is used and independently of the value of IOPL. For
details about the LOCK prefix, see "Lock Prefix" on page 8.

It was this that lead me to believe it was implied on the cmpxchg (I am actually using cmpxchg8b now). I was wondering though dedndave, in your post, you used xchg (I figured you were using a cmp instruction not listed in your code) wouldnt that leave the possibility of 2 threads entering the function then? If one thread were to have compared right after another thread had done its compare but before it executed the xchg instruction then youd have 2 threads in the same section of code you were trying to protect, or am I missing something?

I wanted to make sure, local variables in a function are not shared are they? on a multiprocessor computer, if 2 processors happen to be in the same function at the same time, they both get their own copy of all locals so that when one processor modifies a variable, the other processor will not see that modification correct?

sinsi · August 20, 2012, 09:34:29 PM

Have you looked at critical sections?

dedndave · August 21, 2012, 01:19:12 AM

http://masm32.com/board/index.php?topic=552.msg4682#msg4682

there is a CMP, there
however, you already hold the semaphore
the one in memory is replaced with a 1
if another thread tries to access it, it will replace the 1 with a 1 via XCHG
when it examines it, it finds the semaphore is already owned by another thread

i don't use CMPXCHG, because it is not supported on older processors
and - i am not sure if it implies a LOCK, as XCHG does

AKRichard · August 21, 2012, 05:07:07 PM

Quote from: sinsi on August 20, 2012, 09:34:29 PM
Have you looked at critical sections?

I had used critical sections when I was using inline assembly in a 32 bit build, but when I decided to try to move it to 64 bit, I figured Id keep the assembly language to a minimum (and hopefully keep it simple) and just use it for the math and didnt think Id need synchronization. Well the simple (not so simple for me at least) has been going out the door as I keep pushing for faster algorithms.

Quote
from dedndave:

i don't use CMPXCHG, because it is not supported on older processors
and - i am not sure if it implies a LOCK, as XCHG does

I found out the hard way it doesnt implicitly lock it, I also finally realized my bug was in that lock, on the cmpxchg8b instruction, if the compare fails (values dont match) it moves the value from memory into the register it was comparing against, therefore on the second run through, the values DID match, would do the xchg, and let the thread through. Once I moved the initialization code inside of the loop, the problem went away. I read that part of the amd doc 50 times and didnt realize that as a problem until I figured out how to trap it doing it.

Good news though, I have it running more stable then microsofts BigInteger (which errors out with memory errors before my algs do now), and as for response times, my addition and subtraction routines are running just a hair slower then microsofts, but my multiplication and division algs are running a lot faster then theirs, my exponentiation and modular exponentiation algs are looking pretty close to twice as fast for numbers with more then 50 elements in the array and exponents bewteen 100 and 500. My testing program isnt highly accurate, Im just using the system clock, but it consistently shows those results. I got to thank you guys, this library has been slowly evolving over the years and was never meant to become this large, but you guys got me past that block that kept me from even considering pushing the project further, Im even going to go back and clean up the code and put it out there for some thoughts on where I should have done things different and how it can improve. Thanks!

The MASM Forum

News:

Multithreaded apps in 64 bit assembler

dedndave

AKRichard

sinsi

dedndave

AKRichard