News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

SIMT challenge

Started by daydreamer, December 15, 2024, 10:41:52 PM

Previous topic - Next topic

daydreamer

I think I return to SIMT task I earlier solved with SIMD + LUT, and follow the student rules to solve it with SIMT
First a way to check with api or user enter number of cores on their computer
1: generate x million random numbers
Probably call rand in a loop
2: check if prime number
3: add those prime numbers together

Main start x number of threads,sending 1,2,3 to thread
On 16 cores cpu, 1,2,3 is choice depending on which is most computing expensive and least computing expensive
Each thread starts with Switch/case
Case 1
Invoke random
Case 2
Invoke primes
Case 3
Invoke sum

Option is return when finished task 1 and 2 and thread joins helping other threads that aren't finished yet

Other ideas is
Two threads random numbers are 1 thread starts from beginning of array and second thread starts from end of array, stops when they reach same adress

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

NoCforMe

I don't get it; what is the point of this whole exercise?
Assembly language programming should be fun. That's why I do it.

daydreamer

When even old computers have 4 or more cores, it's easier with several threads coding standard x86 gp reg code than exotic SIMD opcodes and more flexible able to run many different Algos /proc on each core than 128 bit SSE 4 x32 bit of same algo

I have 4 cores on my newest and want try learn use all 4
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

Siekmanski

Quote from: NoCforMe on December 16, 2024, 08:42:30 AMI don't get it; what is the point of this whole exercise?

A challenge to excel in programming by using your logical and analytical part of your brain.  :cool:
Creative coders use backward thinking techniques as a strategy.

zedd151

#4
All I see here is a word salad.  :biggrin:

And only a 'concept' of an 'idea'.   :tongue:

daydreamer

I have no scalar code prime testing code yet,sieve and divide, I only have experimental SSE : real4's using many divps using 2.0 to around 300.0 with a bug :
its not capable of detect all those lower 2.0-300.0 primes and limited to REAL4 range of ca 9 million

 

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

sinsi

Some older (but still recent) CPUs would reduce the clock if full AVX was used on multi cores because of the heat.
With Intel CPUs you also need to pick a CPU that's not hyper-threaded.

Multithreading isn't easy, that's why most apps don't use it.
Also, CPU multithreading is a whole lot different to using an OS's thread functions.

daydreamer

Sinsi, I don't want to end up in the future stuck in only 1 core coding, 1 of 192 cores = 0.5% cpu usage : Hall of shame  :sad:

Gonna Start code/debug now
Ideas for several different proc's that might be used on several different threads / cores
 
SIMT exercise
Xorps xmm7,xmm7
Movaps sums,xmm7

1:
Movaps xmm0,start
Mulps xmm0,seeds
Addps xmm0,extra
Movaps result, xmm0
2b :
Movaps xmm0,result
Movss xmm1,[ebx*4+prime list ]
Shuffle to all 4 floats
Subps xmm0,Xmm1
Comiss xmm0,zeros
Jne
Shift xmm0, 32 bits
Comiss xmm0,zeros
Shift xmm0, 32 bits
Comiss xmm0,zeros
Shift xmm0, 32 bits
Comiss xmm0,zeros
Add ebx,1 ; check next number
Add ecx,1
Cmp ecx,length prime list
Jne l2 b
2 c :
Prime list Db,0,0,2,3,5,7,11,13,...
Mov ecx,number of random numbers
Mov Al,random byte
Xlat prime list
Add edx,eax
Dec ecx
Jne l2 c

2:

L2:
Movaps xmm1, result
Divps xmm1,[prime list+ebx]
subps xmm1,%
Comiss xmm1,zeros (*4 comiss)

3:
If primes addps sums,primes
Haddps sum,sums

Proc Init
Invoke malloc ( if 64 bit mode,possible to malloc huge 4+ GB prime LUT)
Open file,"prime LUT.bin"
If eax!=true Invoke Create LUT
Create file, "prime LUT.bin"
Block write LUT
Close file
Else
Block read file
Close file



my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding