Timings please, just for fun. Credits go to qWord :thumbsup:
AMD Athlon Gold 3150U with Radeon Graphics (SSE4)
5503 cycles for 100 * Masm32 hex$
4298 cycles for 100 * MasmBasic Hex$
352 cycles for 100 * qWord mmx hex$
428 cycles for 100 * qWord xmm hex$
44520 cycles for 100 * CRT hex$
5675 cycles for 100 * Masm32 hex$
4338 cycles for 100 * MasmBasic Hex$
377 cycles for 100 * qWord mmx hex$
431 cycles for 100 * qWord xmm hex$
44424 cycles for 100 * CRT hex$
5499 cycles for 100 * Masm32 hex$
4305 cycles for 100 * MasmBasic Hex$
353 cycles for 100 * qWord mmx hex$
427 cycles for 100 * qWord xmm hex$
44564 cycles for 100 * CRT hex$
5479 cycles for 100 * Masm32 hex$
4320 cycles for 100 * MasmBasic Hex$
352 cycles for 100 * qWord mmx hex$
433 cycles for 100 * qWord xmm hex$
47006 cycles for 100 * CRT hex$
5513 cycles for 100 * Masm32 hex$
4400 cycles for 100 * MasmBasic Hex$
361 cycles for 100 * qWord mmx hex$
429 cycles for 100 * qWord xmm hex$
44405 cycles for 100 * CRT hex$
Averages:
5505 cycles for Masm32 hex$
4321 cycles for MasmBasic Hex$
355 cycles for qWord mmx hex$
429 cycles for qWord xmm hex$
44503 cycles for CRT hex$
16 bytes for Masm32 hex$
12 bytes for MasmBasic Hex$
92 bytes for qWord mmx hex$
124 bytes for qWord xmm hex$
29 bytes for CRT hex$
Masm32 hex$ 1234ABCD
MasmBasic Hex$ 1234ABCD
qWord mmx hex$ 1234ABCD
qWord xmm hex$ 1234ABCD
CRT hex$ 1234ABCD
Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (SSE4)
4132 cycles for 100 * Masm32 hex$
8482 cycles for 100 * MasmBasic Hex$
797 cycles for 100 * qWord mmx hex$
1017 cycles for 100 * qWord xmm hex$
75531 cycles for 100 * CRT hex$
4105 cycles for 100 * Masm32 hex$
8427 cycles for 100 * MasmBasic Hex$
797 cycles for 100 * qWord mmx hex$
1018 cycles for 100 * qWord xmm hex$
75587 cycles for 100 * CRT hex$
4090 cycles for 100 * Masm32 hex$
8443 cycles for 100 * MasmBasic Hex$
795 cycles for 100 * qWord mmx hex$
1020 cycles for 100 * qWord xmm hex$
75549 cycles for 100 * CRT hex$
4089 cycles for 100 * Masm32 hex$
8482 cycles for 100 * MasmBasic Hex$
801 cycles for 100 * qWord mmx hex$
1017 cycles for 100 * qWord xmm hex$
75745 cycles for 100 * CRT hex$
4262 cycles for 100 * Masm32 hex$
8521 cycles for 100 * MasmBasic Hex$
808 cycles for 100 * qWord mmx hex$
1019 cycles for 100 * qWord xmm hex$
75669 cycles for 100 * CRT hex$
Averages:
4109 cycles for Masm32 hex$
8469 cycles for MasmBasic Hex$
798 cycles for qWord mmx hex$
1018 cycles for qWord xmm hex$
75602 cycles for CRT hex$
16 bytes for Masm32 hex$
12 bytes for MasmBasic Hex$
92 bytes for qWord mmx hex$
124 bytes for qWord xmm hex$
29 bytes for CRT hex$
Masm32 hex$ 1234ABCD
MasmBasic Hex$ 1234ABCD
qWord mmx hex$ 1234ABCD
qWord xmm hex$ 1234ABCD
CRT hex$ 1234ABCD
--- ok ---
QuoteTimings please, just for fun.
Cycle counts?? :wink2:
Thanks, sudoku :thumbsup:
It seems my attempt to port qWord's algo from mmx to xmm failed miserably :sad:
Need some more intels, so I can compare others to my computer.
Masm32 beats MasmBasic. No, it can't be. :joking:
13th Gen Intel(R) Core(TM) i9-13900KF (SSE4)
1041 cycles for 100 * Masm32 hex$
2648 cycles for 100 * MasmBasic Hex$
317 cycles for 100 * qWord mmx hex$
246 cycles for 100 * qWord xmm hex$
18828 cycles for 100 * CRT hex$
1046 cycles for 100 * Masm32 hex$
2578 cycles for 100 * MasmBasic Hex$
314 cycles for 100 * qWord mmx hex$
249 cycles for 100 * qWord xmm hex$
18788 cycles for 100 * CRT hex$
1034 cycles for 100 * Masm32 hex$
2655 cycles for 100 * MasmBasic Hex$
315 cycles for 100 * qWord mmx hex$
248 cycles for 100 * qWord xmm hex$
18821 cycles for 100 * CRT hex$
1035 cycles for 100 * Masm32 hex$
2607 cycles for 100 * MasmBasic Hex$
334 cycles for 100 * qWord mmx hex$
254 cycles for 100 * qWord xmm hex$
18860 cycles for 100 * CRT hex$
1051 cycles for 100 * Masm32 hex$
2629 cycles for 100 * MasmBasic Hex$
321 cycles for 100 * qWord mmx hex$
248 cycles for 100 * qWord xmm hex$
18909 cycles for 100 * CRT hex$
Averages:
1041 cycles for Masm32 hex$
2628 cycles for MasmBasic Hex$
318 cycles for qWord mmx hex$
248 cycles for qWord xmm hex$
18836 cycles for CRT hex$
16 bytes for Masm32 hex$
12 bytes for MasmBasic Hex$
92 bytes for qWord mmx hex$
124 bytes for qWord xmm hex$
29 bytes for CRT hex$
Masm32 hex$ 1234ABCD
MasmBasic Hex$ 1234ABCD
qWord mmx hex$ 1234ABCD
qWord xmm hex$ 1234ABCD
CRT hex$ 1234ABCD
18900 crt vs 248 xmm ,if all CRT functions are much slower than a library with fast asm function library does that mean a c program using CRT functions is that much slower compared to asm program using asm library ?
Its 76.20967741935 times faster
Quote from: daydreamer on April 17, 2024, 03:25:59 AMdoes that mean a c program using CRT functions is that much slower compared to asm program using asm library ?
Well, of course it means that
that particular CRT function is that much slower compared to the other (assembler) functions. It doesn't necessarily mean that the
program as a whole is that much slower. And as I always try to point out, it may not mean anything at all if these functions are only used to display user input or output, instead of in a way which would significantly affect overall program speed (say if one is processing 100,000 numbers in a spreadsheet or something).
Quote from: sudoku on April 17, 2024, 02:21:54 AMMasm32 beats MasmBasic. No, it can't be. :joking:
You are right - I forgot the
fast option :thumbsup:
mov somevar, Hex$(123456789, fast) (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1198) (DWORDs only)
AMD Athlon Gold 3150U with Radeon Graphics (SSE4)
Averages:
5528 cycles for Masm32 hex$
275 cycles for MasmBasic Hex$
357 cycles for qWord mmx hex$
433 cycles for qWord xmm hex$
44624 cycles for CRT hex$
Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (SSE4)
4105 cycles for 100 * Masm32 hex$
624 cycles for 100 * MasmBasic Hex$
797 cycles for 100 * qWord mmx hex$
1018 cycles for 100 * qWord xmm hex$
75615 cycles for 100 * CRT hex$
4128 cycles for 100 * Masm32 hex$
603 cycles for 100 * MasmBasic Hex$
801 cycles for 100 * qWord mmx hex$
1036 cycles for 100 * qWord xmm hex$
75388 cycles for 100 * CRT hex$
4114 cycles for 100 * Masm32 hex$
594 cycles for 100 * MasmBasic Hex$
801 cycles for 100 * qWord mmx hex$
1021 cycles for 100 * qWord xmm hex$
75292 cycles for 100 * CRT hex$
4110 cycles for 100 * Masm32 hex$
593 cycles for 100 * MasmBasic Hex$
801 cycles for 100 * qWord mmx hex$
1020 cycles for 100 * qWord xmm hex$
75426 cycles for 100 * CRT hex$
4103 cycles for 100 * Masm32 hex$
594 cycles for 100 * MasmBasic Hex$
798 cycles for 100 * qWord mmx hex$
1021 cycles for 100 * qWord xmm hex$
75507 cycles for 100 * CRT hex$
Averages:
4110 cycles for Masm32 hex$
597 cycles for MasmBasic Hex$
800 cycles for qWord mmx hex$
1021 cycles for qWord xmm hex$
75440 cycles for CRT hex$
16 bytes for Masm32 hex$
16 bytes for MasmBasic Hex$
92 bytes for qWord mmx hex$
124 bytes for qWord xmm hex$
29 bytes for CRT hex$
Masm32 hex$ 1234ABCD
MasmBasic Hex$ 1234ABCD
qWord mmx hex$ 1234ABCD
qWord xmm hex$ 1234ABCD
CRT hex$ 1234ABCD
--- ok ---
Turbo Mode! :biggrin:
13th Gen Intel(R) Core(TM) i9-13900KF (SSE4)
Averages:
1036 cycles for Masm32 hex$
132 cycles for MasmBasic Hex$
306 cycles for qWord mmx hex$
249 cycles for qWord xmm hex$
18706 cycles for CRT hex$
Showoff
Thanks, folks :thup:
Honestly, I had forgotten the
fast option, which was tested in this thread (https://masm32.com/board/index.php?msg=122102) in July 2023 :cool:
Quote from: NoCforMe on April 17, 2024, 06:28:04 AMit may not mean anything at all if these functions are only used to display user input or output
That would indeed be nonsense, but nobody proposed that.
Quote from: jj2007 on April 17, 2024, 07:43:15 AMThanks, folks :thup:
Honestly, I had forgotten the fast option, which was tested in this thread (https://masm32.com/board/index.php?msg=122102) in July 2023 :cool:
Quote from: NoCforMe on April 17, 2024, 06:28:04 AMit may not mean anything at all if these functions are only used to display user input or output
That would indeed be nonsense, but nobody proposed that.
No, nobody
proposed that, but everybody seems to be
ignoring that.
I'd be willing to bet that 80-90% of use cases for these (numeric conversion) functions are for displaying or inputting small amounts of user output/input. Can't prove it, of course, but I'm pretty sure.
How many people here are actually writing applications where it
does make a difference in speed? Someone somewhere here mentioned spreadsheets, but really, who besides Micro$oft is actually
coding a spreadsheet?
Quote from: NoCforMe on April 17, 2024, 08:22:48 AMwho besides Micro$oft is actually coding a spreadsheet?
May 28, 2014, 01:53:07 PM: Spreadsheet viewer (https://masm32.com/board/index.php?topic=3231.0) (nowadays am ultrafast spreadsheet
editor)
April 01, 2024, 09:53:55 AM: Obsession with speed (https://masm32.com/board/index.php?msg=128433)
You are missing the point. An application that uses a slow library will be slow because
everything it does depends on slow functions. Every sane programmer will avoid using slow functions in an innermost loop, but if the only library he has is as slow as the CRT, then inevitably his applications will be, ehm, a bit slow. And the World of software is
full of awfully slow applications. Oh, btw,
LibreOffice seems to use a slow library: I once measured its spreadsheet editor Calc's sorting performance against M$ Excel: horrible, over a factor 10 slower (others have done that, too (https://gigazine.net/gsc_news/en/20191216-spreadsheet-benchmark/)). Which doesn't mean that Excel is fast, of course - my spreadsheet editor sorts column considerably faster than Excel...
QuoteIn addition, when Google Sheet exceeded 20,000 lines, processing took 1.5 seconds, and when it exceeded 50,000 lines, it was found that it waited nearly 5 seconds. (https://gigazine.net/gsc_news/en/20191216-spreadsheet-benchmark/)
Quoteall three spreadsheet systems, i.e., Excel, Calc, and Google Sheets, require more than 500ms to sort a spreadsheet with 10k, 6k, and 10k rows, respectively (https://sajjadur.net/files/benchmarking_spreadsheets.pdf)
My spreadsheet editor is not perfect, but it sorts a 44,600 rows spreadsheet in about
30 milliseconds. It uses a fast library.
OK, real world scenario. You are using an Explorer-like interface and your ListView needs to show file sizes.
Would a function that takes 10 times longer be OK? No worries for a half-screen of files, but 10,000?
Quote from: NoCforMe on April 17, 2024, 08:22:48 AMwho besides Micro$oft is actually coding a spreadsheet?
[rant]I worked as an on-site tech, the number of people who use Excel for everything would astound you (and make you slightly ill). Weekly budgets where they just add a few rows per week until there are thousands of rows, all automatically refreshing themselves every time data is entered. One lady used it to store recipes :rolleyes: granted there's not much use for numbers there.
Every accountant in every business I've serviced has their own pet spreadsheet, filled with VBA that uses strings for numbers because they are accountants, not programmers.
Because MS made it part of the basic versions of Office, everyone uses and abuses it.
[/rant]
Off topic, but I find it hard to take someone seriously when they use the term
Micro$oft.
Makes you sound 14 :biggrin:
[off topic]I'll just sit here and eat my popcorn. [/off topic] :tongue: :joking: :rofl: :badgrin:
To be ON topic, fast enough is fast enough. Always depends on the application of any algo, and how often you need to use it.
Popcorn is always good :thumbsup:
I am trying to get some real data from the UN (https://unstats.un.org/sdgs/indicators/databaseLegacy), but it may take a while.
Quote from: sinsi on April 17, 2024, 09:10:01 AMQuote from: NoCforMe on April 17, 2024, 08:22:48 AMwho besides Micro$oft is actually coding a spreadsheet?
[rant]I worked as an on-site tech, the number of people who use Excel for everything would astound you (and make you slightly ill).
Ackshooly, no it wouldn't and no it wouldn't: the CFO for a company I worked for for many years (in the computer industry) used not Excel, not Lotus 1-2-3 but the Lotus clone, Quattro Pro, for absolutely everything. Including some very clever uses most people would never have thought of. None of which I would consider an "abuse" of that tool (what, you're only supposed to use software for things approved of by the maker and documented in the user manual?)
QuoteOff topic, but I find it hard to take someone seriously when they use the term Micro$oft.
Makes you sound 14 :biggrin:
Well, that's your problem, not mine.
Quote from: sinsi on April 17, 2024, 09:10:01 AMOK, real world scenario. You are using an Explorer-like interface and your ListView needs to show file sizes.
Would a function that takes 10 times longer be OK? No worries for a half-screen of files, but 10,000?
As I have taken pains to write every damn time I bring this up, that is one of the exceptions. Obviously. I still hold that those exceptions are
maybe 5-10% of most usage of these functions, if that.
I was hoping JJ's UN database would show the distribution of such usages, but instead it seems to be a collection of wokeness and unattainable goals, quanitized somehow.
CSV / TSV tables are in textual format and needs conversions.
Also endian free tables avoid binary formats.
Quote from: sinsi on April 17, 2024, 09:10:01 AMOK, real world scenario.
Another one: you have a 30k Assembly source, and you are debugging the beast. Will you use MASM (20 seconds) or UAsm (4 seconds) to build it?
Staring four entire seconds at the screen is wayyyy too slow btw. The UAsm developers should try one day to use a fast library (http://masm32.com/board/index.php?topic=94.0) for string parsing and the like :cool:
I know about big-endian and little-endian, but what are "endian free tables"?
Quote from: NoCforMe on April 17, 2024, 05:57:09 PMI know about big-endian and little-endian, but what are "endian free tables"?
tables of numerical values in text format for transfers between different systems.
Sometimes in database formats.
jj2007 might told that many times.
Quote from: jj2007 on April 17, 2024, 05:56:54 PMAnother one: you have a 30k Assembly source, and you are debugging the beast. Will you use MASM (20 seconds) or UAsm (4 seconds) to build it?
20 seconds? Maybe you need a new computer, JJ: my current assembly project source is 52+k, and it assembles
and links (with ml.exe and link.exe) in a blink of an eye. (ML 6.14.8444) Oh, and that includes the resource compiler & converter.
Quote from: NoCforMe on April 17, 2024, 06:59:46 PMmy current assembly project source is 52+k, and it assembles and links (with ml.exe and link.exe) in a blink of an eye
Lucky you :thumbsup:
My sources are a bit complex. RichMasm, for example, has 25k lines, but a lot of that is macros. The resulting exe is 186,880 bytes, of which only 13k are resources*). For some time, I compared the assembly times (link and rc are always negligeable), and ML was typically a factor 3-5 slower than UAsm, which in turn was about 25% slower than AsmC.
Same for the MasmBasic library at 3.7 seconds with UAsm, over 10 seconds with ML 6.15 - it creates a 154k lib file.
Note that I invested a lot of time to make sure that all my templates build fine with MASM.
They do. Sadly enough, my personal sources don't: ML.exe (any version) can't handle their complexities any more. Often, it fails with "internal error". Bad luck.
*) For comparison, your EdAsm exe has 118784 bytes, of which 62440 bytes resources, i.e. 56,344 net exe bytes for a 10.9 kLines source with a high share of comments
Hi,
Three laptops, Intel processors.
Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)
3276 cycles for 100 * Masm32 hex$
412 cycles for 100 * MasmBasic Hex$
678 cycles for 100 * qWord mmx hex$
686 cycles for 100 * qWord xmm hex$
60721 cycles for 100 * CRT hex$
3294 cycles for 100 * Masm32 hex$
380 cycles for 100 * MasmBasic Hex$
765 cycles for 100 * qWord mmx hex$
686 cycles for 100 * qWord xmm hex$
58533 cycles for 100 * CRT hex$
3286 cycles for 100 * Masm32 hex$
375 cycles for 100 * MasmBasic Hex$
688 cycles for 100 * qWord mmx hex$
679 cycles for 100 * qWord xmm hex$
58595 cycles for 100 * CRT hex$
3267 cycles for 100 * Masm32 hex$
375 cycles for 100 * MasmBasic Hex$
695 cycles for 100 * qWord mmx hex$
675 cycles for 100 * qWord xmm hex$
58223 cycles for 100 * CRT hex$
3266 cycles for 100 * Masm32 hex$
374 cycles for 100 * MasmBasic Hex$
676 cycles for 100 * qWord mmx hex$
675 cycles for 100 * qWord xmm hex$
62799 cycles for 100 * CRT hex$
Averages:
3276 cycles for Masm32 hex$
377 cycles for MasmBasic Hex$
687 cycles for qWord mmx hex$
680 cycles for qWord xmm hex$
59283 cycles for CRT hex$
16 bytes for Masm32 hex$
16 bytes for MasmBasic Hex$
92 bytes for qWord mmx hex$
124 bytes for qWord xmm hex$
29 bytes for CRT hex$
Masm32 hex$1234ABCD
MasmBasic Hex$1234ABCD
qWord mmx hex$1234ABCD
qWord xmm hex$1234ABCD
CRT hex$1234ABCD
--- ok ---
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
5053 cycles for 100 * Masm32 hex$
1152 cycles for 100 * MasmBasic Hex$
1224 cycles for 100 * qWord mmx hex$
2215 cycles for 100 * qWord xmm hex$
95099 cycles for 100 * CRT hex$
5066 cycles for 100 * Masm32 hex$
1117 cycles for 100 * MasmBasic Hex$
1218 cycles for 100 * qWord mmx hex$
2220 cycles for 100 * qWord xmm hex$
95139 cycles for 100 * CRT hex$
5056 cycles for 100 * Masm32 hex$
1122 cycles for 100 * MasmBasic Hex$
1221 cycles for 100 * qWord mmx hex$
2221 cycles for 100 * qWord xmm hex$
95172 cycles for 100 * CRT hex$
5073 cycles for 100 * Masm32 hex$
1110 cycles for 100 * MasmBasic Hex$
1209 cycles for 100 * qWord mmx hex$
2211 cycles for 100 * qWord xmm hex$
95145 cycles for 100 * CRT hex$
5057 cycles for 100 * Masm32 hex$
1126 cycles for 100 * MasmBasic Hex$
1217 cycles for 100 * qWord mmx hex$
2219 cycles for 100 * qWord xmm hex$
95141 cycles for 100 * CRT hex$
Averages:
5060 cycles for Masm32 hex$
1122 cycles for MasmBasic Hex$
1219 cycles for qWord mmx hex$
2218 cycles for qWord xmm hex$
95142 cycles for CRT hex$
16 bytes for Masm32 hex$
16 bytes for MasmBasic Hex$
92 bytes for qWord mmx hex$
124 bytes for qWord xmm hex$
29 bytes for CRT hex$
Masm32 hex$1234ABCD
MasmBasic Hex$1234ABCD
qWord mmx hex$1234ABCD
qWord xmm hex$1234ABCD
CRT hex$1234ABCD
--- ok ---
Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz (SSE4)
3186 cycles for 100 * Masm32 hex$
791 cycles for 100 * MasmBasic Hex$
892 cycles for 100 * qWord mmx hex$
669 cycles for 100 * qWord xmm hex$
66380 cycles for 100 * CRT hex$
2895 cycles for 100 * Masm32 hex$
412 cycles for 100 * MasmBasic Hex$
603 cycles for 100 * qWord mmx hex$
579 cycles for 100 * qWord xmm hex$
50447 cycles for 100 * CRT hex$
2569 cycles for 100 * Masm32 hex$
351 cycles for 100 * MasmBasic Hex$
977 cycles for 100 * qWord mmx hex$
1061 cycles for 100 * qWord xmm hex$
72118 cycles for 100 * CRT hex$
3579 cycles for 100 * Masm32 hex$
515 cycles for 100 * MasmBasic Hex$
553 cycles for 100 * qWord mmx hex$
540 cycles for 100 * qWord xmm hex$
51155 cycles for 100 * CRT hex$
2973 cycles for 100 * Masm32 hex$
366 cycles for 100 * MasmBasic Hex$
533 cycles for 100 * qWord mmx hex$
795 cycles for 100 * qWord xmm hex$
57763 cycles for 100 * CRT hex$
Averages:
3018 cycles for Masm32 hex$
431 cycles for MasmBasic Hex$
683 cycles for qWord mmx hex$
681 cycles for qWord xmm hex$
58433 cycles for CRT hex$
16 bytes for Masm32 hex$
16 bytes for MasmBasic Hex$
92 bytes for qWord mmx hex$
124 bytes for qWord xmm hex$
29 bytes for CRT hex$
Masm32 hex$1234ABCD
MasmBasic Hex$1234ABCD
qWord mmx hex$1234ABCD
qWord xmm hex$1234ABCD
CRT hex$1234ABCD
--- ok ---
Regards,
Steve
Quote from: NoCforMe on April 17, 2024, 08:22:48 AMHow many people here are actually writing applications where it does make a difference in speed? Someone somewhere here mentioned spreadsheets, but really, who besides Micro$oft is actually coding a spreadsheet?
In non hardware accelerated Games it does matter in speed,especially if you are limited to dos 16 bit emulator, which makes x86 code run much slower than on your 3 ghz cpu
I had several android devices with many different office clone app,seem each brand developed their own version of office clone
Btw the name quatro pro sounds more like audio car model than computer program :biggrin:
Quote from: FORTRANS on April 17, 2024, 10:22:22 PMThree laptops, Intel processors.
Thanks a lot, Steve :thup:
I've been playing around with threads and had a crazy idea...multithreaded hex$ :biggrin:
Create 4 suspended threads. Each thread is passed the offset of a byte ([number+i])and a word ([result+j]).
Fill each thread's byte, start the timer and the threads and have them convert it to two ascii characters.
Wait until all threads finish, get the elapsed time.
Faster?
Quote from: sinsi on April 19, 2024, 12:02:37 PM...multithreaded hex$ :biggrin:
You would have to include the overhead for the thread creation, etc. in the times... to be fair to the other algos == not from thread(s) start to thread(s) end only. Are you already experimenting with it?
... Yes, *should* be faster.
You could create the threads and leave them suspended, just wake them up to use them then back to sleep.
Quote from: sudoku on April 19, 2024, 12:19:41 PMAre you already experimenting with it?
I'm waiting for someone to tell me "you're crazy" or "hmmm, interesting".
It's a silly idea as far as using it for something so trivial, but there aren't that many practical uses for threads.
@sinsi: You're crazy. :tongue:
I "might" take a look at this later... "maybe" (I'm lazy)
Quote from: sinsi on April 19, 2024, 12:29:32 PMIt's a silly idea as far as using it for something so trivial...
Right. That's why you would do as many conversion as possible to justify the overhead... not just one conversion. The overhead might negate any savings otherwise. Would be perfect for a spreadsheet, methinks.
Quote from: sinsi on April 19, 2024, 12:29:32 PMYou could create the threads and leave them suspended, just wake them up to use them then back to sleep.
Quote from: sudoku on April 19, 2024, 12:19:41 PMAre you already experimenting with it?
I'm waiting for someone to tell me "you're crazy" or "hmmm, interesting".
It's a silly idea as far as using it for something so trivial, but there aren't that many practical uses for threads.
Interesting, I tried SIMT using one worker thread adding together very big fibonnaci numbers,while main thread takes care of milliseconds print numbers
So I reduce the print in loop that adds together fibonnaci = reduce the slowest thing that takes Milliseconds, when rest of the loop takes clock cycles
Tried Workerthread in Windows program take and timed it 10 times more clock cycles than with peekmessage method
Quote from: sinsi on April 19, 2024, 12:02:37 PMFaster?
Thread overhead is far too high for individual numbers. However, if you have a gigabyte to convert, splitting the task will certainly be faster.
Create the threads beforehand (like making a lookup table, once is enough).
Each thread suspends itself after the calculation by setting a bit to say "I've finished" and SuspendThread.
The thread waits for the next job via ResumeThread.
A more realistic use might be to fill a huge block of memory?
Quote from: sinsi on April 19, 2024, 06:11:37 PMA more realistic use might be to fill a huge block of memory?
Yes indeed. And each thread working on "its" slots, i.e.
thread #1 working on start+16*n+0
thread #2 working on start+16*n+4
thread #3 working on start+16*n+8
thread #4 working on start+16*n+12
That would ensure good usage of the L1 cache while keeping busy a significant share of the CPU. I wonder, though, about cache misses if one thread lags behind. That is,
if there can be a significant speed difference.
I would like to start a new thread for SIMT exercises + timings
Sleep in main thread suggest cpu switches to another thread after starting x number of threads
Program Decides # threads after check cpu # of cores?, because we have very different cpu's with different # of cores
I am curious if Createthread directly start execution with set custom bigger stack space to use local arrays vs several invokes
Invoke Createthread suspended
Invoke alloc memory
Invoke start thread?
Quote from: daydreamer on April 20, 2024, 12:15:06 AMI would like to start a new thread for SIMT exercises + timings
Sleep in main thread suggest cpu switches to another thread after starting x number of threads
Program Decides # threads after check cpu # of cores?, because we have very different cpu's with different # of cores
Post some code and we'll test it, Magnus...
We're always up for a new speed test/challenge...