Hi
I have tried with fpu code in the workerthread proc
mov esi,iterations
@@L1:
fld f1
fdiv f2
fstp fresult
dec esi
jne @@L1
but very disappointing result on my quadcore cpu when test 2 threads and 4 threads,probably because invoke several createthreads and invoke waitformultipleEvents eats up performance gain:(
do I need to run it for an halfhour or something like multithread Cg apps do for it worth all that createthread?
lowlevel synchronized start several cores in kinda waitstate?
lowlevel semaphore reports thread finished instead of apis?
I want to test up to 16 threads for those who have 16 core cpus
its lot easier to program fpu with complete instruction set compared to SSE
fdiv f2 is slow. If possible, replace it with fmul (1/f2)
Quote from: daydreamer on August 22, 2023, 08:29:33 PMdo I need to run it for an halfhour or something like multithread Cg apps do for it worth all that createthread?
lowlevel synchronized start several cores in kinda waitstate?
lowlevel semaphore reports thread finished instead of apis?
Could be.
You raise an interesting question, Magnus: Do multi core CPUs have an FPU for each core or do all cores share a single FPU? (https://www.quora.com/Do-multi-core-CPUs-have-an-FPU-for-each-core-or-do-all-cores-share-a-single-FPU)
QuoteIt depends on the cpu.
Most modern CPUs have one per core, but historically, some designs have had shared FPUs. Most notably bulldozer which failed miserably partially due to that decision (And shared caches).
QuoteEach core is complete with its own FPU.
Don't confuse multi-core with hyperthreading - where this is not true.
Jochen
Simplest use of workerthread is loop without api calls you have the advantage of use whole set of all gp regs,all xmm regs, 7fp regs,compared to windows wndproc can be filled with lots of api calls
One candidate for testing is dedicated workerthread for drawing using getdc/release dc vs wm_paint when wndproc filled with lots of gui code
I think you just answered a question he did not ask.