News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

SIMT challenge

Started by daydreamer, December 15, 2024, 10:41:52 PM

Previous topic - Next topic

daydreamer

I think I return to SIMT task I earlier solved with SIMD + LUT, and follow the student rules to solve it with SIMT
First a way to check with api or user enter number of cores on their computer
1: generate x million random numbers
Probably call rand in a loop
2: check if prime number
3: add those prime numbers together

Main start x number of threads,sending 1,2,3 to thread
On 16 cores cpu, 1,2,3 is choice depending on which is most computing expensive and least computing expensive
Each thread starts with Switch/case
Case 1
Invoke random
Case 2
Invoke primes
Case 3
Invoke sum

Option is return when finished task 1 and 2 and thread joins helping other threads that aren't finished yet

Other ideas is
Two threads random numbers are 1 thread starts from beginning of array and second thread starts from end of array, stops when they reach same adress

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

NoCforMe

I don't get it; what is the point of this whole exercise?
Assembly language programming should be fun. That's why I do it.

daydreamer

When even old computers have 4 or more cores, it's easier with several threads coding standard x86 gp reg code than exotic SIMD opcodes and more flexible able to run many different Algos /proc on each core than 128 bit SSE 4 x32 bit of same algo

I have 4 cores on my newest and want try learn use all 4
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

Siekmanski

Quote from: NoCforMe on December 16, 2024, 08:42:30 AMI don't get it; what is the point of this whole exercise?

A challenge to excel in programming by using your logical and analytical part of your brain.  :cool:
Creative coders use backward thinking techniques as a strategy.

zedd151

#4
All I see here is a word salad.  :biggrin:

And only a 'concept' of an 'idea'.   :tongue:
:cool:

daydreamer

I have no scalar code prime testing code yet,sieve and divide, I only have experimental SSE : real4's using many divps using 2.0 to around 300.0 with a bug :
its not capable of detect all those lower 2.0-300.0 primes and limited to REAL4 range of ca 9 million

 

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

sinsi

Some older (but still recent) CPUs would reduce the clock if full AVX was used on multi cores because of the heat.
With Intel CPUs you also need to pick a CPU that's not hyper-threaded.

Multithreading isn't easy, that's why most apps don't use it.
Also, CPU multithreading is a whole lot different to using an OS's thread functions.

daydreamer

#7
so far:
scalar SSE compare floats
after that
packed SSE compare floats

// primesx.cpp

#include "pch.h"
#include <iostream>
using namespace std;
float zero = 0.0;
int pflag = 0;
alignas(16) float flut[]{ 2.0,3.0,5.0,7.0,11.0,13.0,17.0,19.0,0.0,0.0,0.0 };
alignas(16) float arr[]{ 0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 };
char lut[]{ 0, 0, 2, 3, 0, 5, 0, 7, 0, 0,0,11,0,13,0,0,0,0,0,0 };
int main()
{
int i, j=3;
float f = 3.0;
float fresult = 0;
    cout << "Primesx\n";
for (i = 0; i < 14; i++) {
cout << i << " ";
if (lut[i] != 0) cout << "prime " << (int)lut[i] << " ";
cout << f << " ";
//f = f + 1.0;
}
cout << f << " ";
cout << "\n\n\n\n";
for (j = 1; j < 14; j=j+1) {
f = (float)j; //test all floats
_asm {
push ebx

mov ecx, 14
mov ebx, 1
; lea ebx, [ebx * 4]
lea ebx, [flut + ebx * 4]
; lea ebx, flut
L2 :
movss xmm0, f
movss xmm1, [ebx]
; subss xmm0, xmm1
ucomiss xmm0, xmm1
jne L1; found prime
ja L4s;ja jump if above,jb jump if below
mov eax, 1
mov pflag, eax
movss xmm0, [ebx]
movss fresult, xmm0
jmp l3;have found prime, jump out of loop
L4s:
mov eax,0ffffffffh
jmp L3
L1 :
xorps xmm0, xmm0
mov eax,0
mov pflag,eax
; movss fresult, xmm0
add ebx, 4
dec ecx
jne L2
L3:
pop ebx


}
fresult = fresult * pflag;
cout << fresult << " ";
}//j
cout << "\n";
for (j = 2; j < 20; j++) {
f = (float)j;

_asm {
push ebx
lea ebx, flut
movss xmm0, f
shufps xmm0, xmm0, 0
movups xmm1, [ebx]
movaps xmm3, xmm1
CMPEQPS xmm1, xmm0
pand xmm1, xmm3
movaps xmm7, xmm1
add ebx, 16
movups xmm1, [ebx]
movaps xmm3, xmm1
CMPEQPS xmm1, xmm0
pand xmm1, xmm3
por xmm1, xmm7
movups arr, xmm1
haddps xmm1, xmm1
haddps xmm1, xmm1
movss fresult, xmm1


pop ebx
}
cout << "xmm reg float 0,float 1,float 2,float 3 : "<< arr[0] << " " << arr[1] << " " << arr[2] << " " << arr[3] << "\n";
cout << fresult << " zero = non prime\n";

}//second time j


}

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding