The MASM Forum

General => The Workshop => Topic started by: daydreamer on August 22, 2023, 08:29:33 PM

Title: SIMT quest test run multiple fpus for better performance???
Post by: daydreamer on August 22, 2023, 08:29:33 PM
Hi
I have tried with fpu code in the workerthread proc

mov esi,iterations
@@L1:
fld f1
fdiv f2
fstp fresult
dec esi
jne @@L1

but very disappointing result on my quadcore cpu when test 2 threads and 4 threads,probably because invoke several createthreads and invoke waitformultipleEvents eats up performance gain:(
do I need to run it for an halfhour or something like multithread Cg apps do for it worth all that createthread?
lowlevel synchronized start several cores in kinda waitstate?
lowlevel semaphore reports thread finished instead of apis?

I want to test up to 16 threads for those who have 16 core cpus
its lot easier to program fpu with complete instruction set compared to SSE
Title: Re: SIMT quest test run multiple fpus for better performance???
Post by: jj2007 on August 22, 2023, 09:03:05 PM
fdiv f2 is slow. If possible, replace it with fmul (1/f2)
Title: Re: SIMT quest test run multiple fpus for better performance???
Post by: HSE on August 22, 2023, 09:53:18 PM
Quote from: daydreamer on August 22, 2023, 08:29:33 PMdo I need to run it for an halfhour or something like multithread Cg apps do for it worth all that createthread?
lowlevel synchronized start several cores in kinda waitstate?
lowlevel semaphore reports thread finished instead of apis?

Could be.
Title: Re: SIMT quest test run multiple fpus for better performance???
Post by: jj2007 on August 22, 2023, 10:53:32 PM
You raise an interesting question, Magnus: Do multi core CPUs have an FPU for each core or do all cores share a single FPU? (https://www.quora.com/Do-multi-core-CPUs-have-an-FPU-for-each-core-or-do-all-cores-share-a-single-FPU)

QuoteIt depends on the cpu.

Most modern CPUs have one per core, but historically, some designs have had shared FPUs. Most notably bulldozer which failed miserably partially due to that decision (And shared caches).

QuoteEach core is complete with its own FPU.

Don't confuse multi-core with hyperthreading - where this is not true.
Title: Re: SIMT quest test run multiple fpus for better performance???
Post by: daydreamer on August 24, 2023, 05:40:51 AM
Jochen
Simplest use of workerthread is loop without api calls you have the advantage of use whole set of all gp regs,all xmm regs, 7fp regs,compared to windows wndproc can be filled with lots of api calls
One candidate for testing is dedicated workerthread for drawing using getdc/release dc vs wm_paint when wndproc filled with lots of gui code

Title: Re: SIMT quest test run multiple fpus for better performance???
Post by: NoCforMe on August 24, 2023, 01:15:37 PM
I think you just answered a question he did not ask.