News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Test piece for starting threads that match the OS count.

Started by hutch--, December 29, 2019, 07:30:59 PM

Previous topic - Next topic

hutch--

The test piece gets the thread count from the OS then starts that many threads. Its working ok on the computers I have to test on, if anyone has the time it would be useful to see if it works OK on other processors.

Siekmanski

Working OK, i7-4930K Win 8.1

1
37211596663992
2
37211596663996
3
37211596664000
4
37211596664004
37211596664008
5
37211596664012
6
37211596664016
7
8
37211596664020
9
37211596664024
10
37211596664028
37211596664032
11
12
37211596664036

Creative coders use backward thinking techniques as a strategy.


avcaballero

i5 W10

37211596671724
1
2
37211596675812
3
37211596677828
4
37211596668536
5
37211596673436
6
37211596657124
Press any key to continue...

jj2007

Core i5

372115966588361

237211596658832

337211596658744

372115966562644

hutch--

Looks like its working OK.

Jose, it should be  thread safe but the order tends to be a little out as some threads get emptied faster than others.

TimoVJL

AMD Ryzen 5 3400G1
37211596660968
2
37211596663268
3
37211596663652
4
37211596659052
5
37211596663748
6
37211596663072
7
37211596657404
8
37211596662712
May the source be with you

daydreamer

intel core i5-7200U cpu @ 2.5ghz 2.71 ghz,20gb memory
1
37211596659364
2
37211596654280
337211596658548

437211596660616

Press any key to continue...

maybe one of the cores are also used for output to screen/OS affect the result?
hyperthreads dont empty same as physical cores?

output in one thread and  in other thread calculation best?
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

aw27

But I think you want to print the thread launching ordinal, not the memory address, that's why you are increasing r13 after each launching.

1
Passed ordinal number: 1
2
Passed ordinal number: 2
3
Passed ordinal number: 3
4
Passed ordinal number: 4
5
Passed ordinal number: 5
6
Passed ordinal number: 6
7
Passed ordinal number: 7
8
Passed ordinal number: 8
9
Passed ordinal number: 9
10
Passed ordinal number: 10
11
Passed ordinal number: 11
12
Passed ordinal number: 12
Press any key to continue...

hutch--

I wanted the thread handle for future use of the thread where the numbering of each thread was to make sure that each thread started. Mixing the console output between the two produced some out of order output and the handles should have been recorded and displayed later but it has done what I expected it to do.

What I am look at is an adaptable technique that starts as many threads as are available on the processor running the code, suspends them until all are up and running and then resumes execution once the tasks are delegated to each thread.

As usual, thanks to all for doing the test as I currently only have 2 computers that I can test on.

CMalcheski

For optimal execution speed, don't make the assumption that the number of physical cores is always the number of available cores.  In my travels over the years I've seen enormous variation in execution speed overall depending on how many worker threads I've created (in this case, for downloading files).  You can never be sure what the OS is doing behind the scenes, and what resources it's claiming to do it.  Even if you did a calibration at run time to determine the most efficient number of concurrent threads, some massive operating system update could kick in right after the calibration and everything the calibration determined is suddenly off.  There is no perfect solution so I just set a max # of concurrent threads as the number of physical cores on the processor. 

aw27

Quote from: CMalcheski on April 03, 2020, 11:43:35 AM
For optimal execution speed, don't make the assumption that the number of physical cores is always the number of available cores.  In my travels over the years I've seen enormous variation in execution speed overall depending on how many worker threads I've created (in this case, for downloading files).  You can never be sure what the OS is doing behind the scenes, and what resources it's claiming to do it.  Even if you did a calibration at run time to determine the most efficient number of concurrent threads, some massive operating system update could kick in right after the calibration and everything the calibration determined is suddenly off.  There is no perfect solution so I just set a max # of concurrent threads as the number of physical cores on the processor.
This is widespread knowledge. Try to produce some code to prove whatever you want.

hutch--

There is more to thread counts than core count * 2 (if your processor has hyperthreading). A couple of factors effect what you can get out of it, the level of individual core saturation (very high intensity processing) and the speed of the core/thread.

If you are setting a thread count to data dribbling in on an internet connection, you can run a much higher thread count as the saturation level for each thread is low but if you are doing high intensity processing you are stuck with core count and thread count and going over this simply increases the thread overhead.

daydreamer

those examples showing use multiple threads,use same code for all threads,but when does it has advantage/disadvantage when you run two threads,one on physical and one on hyperthread vs two on physical cores?
same code but with different values in registers/flags makes least difference in performance,but if you have two completely different proc's running in two threads,its will make difference hyperthreads/physical cores
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

hutch--

Each thread has its own set of registers, the only variation when running identical code is with differences in the core speed. When you have different code on different threads, there is no means of comparison. You can clock each process but that requires the thread to finish.