News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Multithreading and the FPU

Started by Biterider, November 09, 2024, 06:22:42 AM

Previous topic - Next topic

Biterider

Hi
I've been wondering about a technology question. 
Most current CPUs have a dedicated FPU per physical core, which means we can theoretically run each FPU from different threads in parallel and speed up floating point calculations significantly.
There are still some limits, like memory access, caching, bus bandwidth, etc., but there should still be a noticeable effect.

I've never seen any publication, article or post doing something like that. Does anyone know anything about it or have looked into it?
I can do my own research, but I thought I'd check it first, just to make sure I'm on the right track.  :biggrin:

Biterider

NoCforMe

Hmm; I can't say for certain, but wouldn't the burden of proof here be on showing that one wouldn't be able to use a FPU in a different thread? If there are multiple FPUs, when it seems reasonable to assume that each one should be able to be used in a separate thread. Why would you be forced to use all of them in a single thread?

I could be wrong about that ...
Assembly language programming should be fun. That's why I do it.

HSE

#2
Hi Biterider,

Quote from: Biterider on November 09, 2024, 06:22:42 AMThere are still some limits, like memory access, caching, bus bandwidth, etc., but there should still be a noticeable effect.

Yes, but you need enough cores, because there are overheads.

I play a time ago in 4 cores machine from UEFI, but happen 2 cores were really available, then not benefit because one core control process and the other run the threads  :biggrin:  :biggrin:

Any way you must transform the problem, and some problems could not be well suited for that.

Quote from: Biterider on November 09, 2024, 06:22:42 AMI've never seen any publication, article or post

Historically distribution was first, because threads were going to run in differents machines. Then probably no specific multicore FPU single processor work is of much interest for big problems.


You can ask Gunther, who work in CERN computer farm with 30000 cores, that is interesting  :thumbsup:


HSE

Equations in Assembly: SmplMath

Biterider

Hi HSE
Quote from: HSE on Today at 12:16:41 AM... in CERN computer farm with 30000 cores, that is interesting  :thumbsup:
My goals are much more modest than that.
I thought I would give it a run with a 3D animation application that moves some objects around sequentially using exclusively the CPU and FPU, like in the old days.
Calculating the movements, the scene and the camera is really FPU intensive. 
I created 2 threads, one for each movement (2 objects moving), but the result was disappointing.  :nie:
The new version was slightly slower than the sequential one.

My guess is that the creation, destruction and synchronisation outweighs the speed gain.
I can see a ThreadPool being the way to go, but I need to code it first.

Biterider


HSE

Quote from: Biterider on Today at 02:56:49 AMMy goals are much more modest than that.

:biggrin:

How many cores you have?

Quote from: Biterider on Today at 02:56:49 AMMy guess is that the creation, destruction and synchronisation outweighs the speed gain.

I think, also can happen that OS is running threads in same core, then you end with same code + context switching.

Not so easy to know what OS is doing  :biggrin:

Quote from: Biterider on Today at 02:56:49 AMI can see a ThreadPool being the way to go, but I need to code it first.

 :thumbsup:
Equations in Assembly: SmplMath

NoCforMe

Found this on Raymond Chen's blog. Dunno if it's of any help here. There may be more there, but this was all I could find in an initial search.
Assembly language programming should be fun. That's why I do it.