News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

SIMT quest test run multiple fpus for better performance???

Started by daydreamer, August 22, 2023, 08:29:33 PM

Previous topic - Next topic

daydreamer

Hi
I have tried with fpu code in the workerthread proc

mov esi,iterations
@@L1:
fld f1
fdiv f2
fstp fresult
dec esi
jne @@L1

but very disappointing result on my quadcore cpu when test 2 threads and 4 threads,probably because invoke several createthreads and invoke waitformultipleEvents eats up performance gain:(
do I need to run it for an halfhour or something like multithread Cg apps do for it worth all that createthread?
lowlevel synchronized start several cores in kinda waitstate?
lowlevel semaphore reports thread finished instead of apis?

I want to test up to 16 threads for those who have 16 core cpus
its lot easier to program fpu with complete instruction set compared to SSE
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

fdiv f2 is slow. If possible, replace it with fmul (1/f2)

HSE

Quote from: daydreamer on August 22, 2023, 08:29:33 PMdo I need to run it for an halfhour or something like multithread Cg apps do for it worth all that createthread?
lowlevel synchronized start several cores in kinda waitstate?
lowlevel semaphore reports thread finished instead of apis?

Could be.
Equations in Assembly: SmplMath

jj2007

You raise an interesting question, Magnus: Do multi core CPUs have an FPU for each core or do all cores share a single FPU?

QuoteIt depends on the cpu.

Most modern CPUs have one per core, but historically, some designs have had shared FPUs. Most notably bulldozer which failed miserably partially due to that decision (And shared caches).

QuoteEach core is complete with its own FPU.

Don't confuse multi-core with hyperthreading - where this is not true.

daydreamer

Jochen
Simplest use of workerthread is loop without api calls you have the advantage of use whole set of all gp regs,all xmm regs, 7fp regs,compared to windows wndproc can be filled with lots of api calls
One candidate for testing is dedicated workerthread for drawing using getdc/release dc vs wm_paint when wndproc filled with lots of gui code

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

NoCforMe

Assembly language programming should be fun. That's why I do it.