News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Very fast hex$ (our hobby is beating the CRT)

Started by jj2007, April 17, 2024, 02:06:33 AM

Previous topic - Next topic

jj2007

Popcorn is always good :thumbsup:

I am trying to get some real data from the UN, but it may take a while.

NoCforMe

Quote from: sinsi on April 17, 2024, 09:10:01 AM
Quote from: NoCforMe on April 17, 2024, 08:22:48 AMwho besides Micro$oft is actually coding a spreadsheet?
[rant]I worked as an on-site tech, the number of people who use Excel for everything would astound you (and make you slightly ill).
Ackshooly, no it wouldn't and no it wouldn't: the CFO for a company I worked for for many years (in the computer industry) used not Excel, not Lotus 1-2-3 but the Lotus clone, Quattro Pro, for absolutely everything. Including some very clever uses most people would never have thought of. None of which I would consider an "abuse" of that tool (what, you're only supposed to use software for things approved of by the maker and documented in the user manual?)
QuoteOff topic, but I find it hard to take someone seriously when they use the term Micro$oft.
Makes you sound 14 :biggrin:
Well, that's your problem, not mine.
Assembly language programming should be fun. That's why I do it.

NoCforMe

Quote from: sinsi on April 17, 2024, 09:10:01 AMOK, real world scenario. You are using an Explorer-like interface and your ListView needs to show file sizes.
Would a function that takes 10 times longer be OK? No worries for a half-screen of files, but 10,000?
As I have taken pains to write every damn time I bring this up, that is one of the exceptions. Obviously. I still hold that those exceptions are maybe 5-10% of most usage of these functions, if that.

I was hoping JJ's UN database would show the distribution of such usages, but instead it seems to be a collection of wokeness and unattainable goals, quanitized somehow.
Assembly language programming should be fun. That's why I do it.

TimoVJL

CSV / TSV tables are in textual format and needs conversions.
Also endian free tables avoid binary formats.
May the source be with you

jj2007

Quote from: sinsi on April 17, 2024, 09:10:01 AMOK, real world scenario.

Another one: you have a 30k Assembly source, and you are debugging the beast. Will you use MASM (20 seconds) or UAsm (4 seconds) to build it?

Staring four entire seconds at the screen is wayyyy too slow btw. The UAsm developers should try one day to use a fast library for string parsing and the like :cool:

NoCforMe

I know about big-endian and little-endian, but what are "endian free tables"?
Assembly language programming should be fun. That's why I do it.

TimoVJL

Quote from: NoCforMe on April 17, 2024, 05:57:09 PMI know about big-endian and little-endian, but what are "endian free tables"?
tables of numerical values in text format for transfers between different systems.
Sometimes in database formats.

jj2007 might told that many times.
May the source be with you

NoCforMe

Quote from: jj2007 on April 17, 2024, 05:56:54 PMAnother one: you have a 30k Assembly source, and you are debugging the beast. Will you use MASM (20 seconds) or UAsm (4 seconds) to build it?
20 seconds? Maybe you need a new computer, JJ: my current assembly project source is 52+k, and it assembles and links (with ml.exe and link.exe) in a blink of an eye. (ML 6.14.8444) Oh, and that includes the resource compiler & converter.
Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: NoCforMe on April 17, 2024, 06:59:46 PMmy current assembly project source is 52+k, and it assembles and links (with ml.exe and link.exe) in a blink of an eye

Lucky you :thumbsup:

My sources are a bit complex. RichMasm, for example, has 25k lines, but a lot of that is macros. The resulting exe is 186,880 bytes, of which only 13k are resources*). For some time, I compared the assembly times (link and rc are always negligeable), and ML was typically a factor 3-5 slower than UAsm, which in turn was about 25% slower than AsmC.

Same for the MasmBasic library at 3.7 seconds with UAsm, over 10 seconds with ML 6.15 - it creates a 154k lib file.

Note that I invested a lot of time to make sure that all my templates build fine with MASM. They do. Sadly enough, my personal sources don't: ML.exe (any version) can't handle their complexities any more. Often, it fails with "internal error". Bad luck.

*) For comparison, your EdAsm exe has 118784 bytes, of which 62440 bytes resources, i.e. 56,344 net exe bytes for a 10.9 kLines source with a high share of comments

FORTRANS

Hi,

   Three laptops, Intel processors.

Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

3276 cycles for 100 * Masm32 hex$
412 cycles for 100 * MasmBasic Hex$
678 cycles for 100 * qWord mmx hex$
686 cycles for 100 * qWord xmm hex$
60721 cycles for 100 * CRT hex$

3294 cycles for 100 * Masm32 hex$
380 cycles for 100 * MasmBasic Hex$
765 cycles for 100 * qWord mmx hex$
686 cycles for 100 * qWord xmm hex$
58533 cycles for 100 * CRT hex$

3286 cycles for 100 * Masm32 hex$
375 cycles for 100 * MasmBasic Hex$
688 cycles for 100 * qWord mmx hex$
679 cycles for 100 * qWord xmm hex$
58595 cycles for 100 * CRT hex$

3267 cycles for 100 * Masm32 hex$
375 cycles for 100 * MasmBasic Hex$
695 cycles for 100 * qWord mmx hex$
675 cycles for 100 * qWord xmm hex$
58223 cycles for 100 * CRT hex$

3266 cycles for 100 * Masm32 hex$
374 cycles for 100 * MasmBasic Hex$
676 cycles for 100 * qWord mmx hex$
675 cycles for 100 * qWord xmm hex$
62799 cycles for 100 * CRT hex$

Averages:
3276 cycles for Masm32 hex$
377 cycles for MasmBasic Hex$
687 cycles for qWord mmx hex$
680 cycles for qWord xmm hex$
59283 cycles for CRT hex$

16 bytes for Masm32 hex$
16 bytes for MasmBasic Hex$
92 bytes for qWord mmx hex$
124 bytes for qWord xmm hex$
29 bytes for CRT hex$

Masm32 hex$1234ABCD
MasmBasic Hex$1234ABCD
qWord mmx hex$1234ABCD
qWord xmm hex$1234ABCD
CRT hex$1234ABCD

--- ok ---

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

5053 cycles for 100 * Masm32 hex$
1152 cycles for 100 * MasmBasic Hex$
1224 cycles for 100 * qWord mmx hex$
2215 cycles for 100 * qWord xmm hex$
95099 cycles for 100 * CRT hex$

5066 cycles for 100 * Masm32 hex$
1117 cycles for 100 * MasmBasic Hex$
1218 cycles for 100 * qWord mmx hex$
2220 cycles for 100 * qWord xmm hex$
95139 cycles for 100 * CRT hex$

5056 cycles for 100 * Masm32 hex$
1122 cycles for 100 * MasmBasic Hex$
1221 cycles for 100 * qWord mmx hex$
2221 cycles for 100 * qWord xmm hex$
95172 cycles for 100 * CRT hex$

5073 cycles for 100 * Masm32 hex$
1110 cycles for 100 * MasmBasic Hex$
1209 cycles for 100 * qWord mmx hex$
2211 cycles for 100 * qWord xmm hex$
95145 cycles for 100 * CRT hex$

5057 cycles for 100 * Masm32 hex$
1126 cycles for 100 * MasmBasic Hex$
1217 cycles for 100 * qWord mmx hex$
2219 cycles for 100 * qWord xmm hex$
95141 cycles for 100 * CRT hex$

Averages:
5060 cycles for Masm32 hex$
1122 cycles for MasmBasic Hex$
1219 cycles for qWord mmx hex$
2218 cycles for qWord xmm hex$
95142 cycles for CRT hex$

16 bytes for Masm32 hex$
16 bytes for MasmBasic Hex$
92 bytes for qWord mmx hex$
124 bytes for qWord xmm hex$
29 bytes for CRT hex$

Masm32 hex$1234ABCD
MasmBasic Hex$1234ABCD
qWord mmx hex$1234ABCD
qWord xmm hex$1234ABCD
CRT hex$1234ABCD

--- ok ---

Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz (SSE4)

3186 cycles for 100 * Masm32 hex$
791 cycles for 100 * MasmBasic Hex$
892 cycles for 100 * qWord mmx hex$
669 cycles for 100 * qWord xmm hex$
66380 cycles for 100 * CRT hex$

2895 cycles for 100 * Masm32 hex$
412 cycles for 100 * MasmBasic Hex$
603 cycles for 100 * qWord mmx hex$
579 cycles for 100 * qWord xmm hex$
50447 cycles for 100 * CRT hex$

2569 cycles for 100 * Masm32 hex$
351 cycles for 100 * MasmBasic Hex$
977 cycles for 100 * qWord mmx hex$
1061 cycles for 100 * qWord xmm hex$
72118 cycles for 100 * CRT hex$

3579 cycles for 100 * Masm32 hex$
515 cycles for 100 * MasmBasic Hex$
553 cycles for 100 * qWord mmx hex$
540 cycles for 100 * qWord xmm hex$
51155 cycles for 100 * CRT hex$

2973 cycles for 100 * Masm32 hex$
366 cycles for 100 * MasmBasic Hex$
533 cycles for 100 * qWord mmx hex$
795 cycles for 100 * qWord xmm hex$
57763 cycles for 100 * CRT hex$

Averages:
3018 cycles for Masm32 hex$
431 cycles for MasmBasic Hex$
683 cycles for qWord mmx hex$
681 cycles for qWord xmm hex$
58433 cycles for CRT hex$

16 bytes for Masm32 hex$
16 bytes for MasmBasic Hex$
92 bytes for qWord mmx hex$
124 bytes for qWord xmm hex$
29 bytes for CRT hex$

Masm32 hex$1234ABCD
MasmBasic Hex$1234ABCD
qWord mmx hex$1234ABCD
qWord xmm hex$1234ABCD
CRT hex$1234ABCD

--- ok ---

Regards,

Steve

daydreamer

Quote from: NoCforMe on April 17, 2024, 08:22:48 AMHow many people here are actually writing applications where it does make a difference in speed? Someone somewhere here mentioned spreadsheets, but really, who besides Micro$oft is actually coding a spreadsheet?
In non hardware accelerated Games it does matter in speed,especially if you are limited to dos 16 bit emulator, which makes x86 code run much slower than on your 3 ghz cpu
I had several android devices with many different office clone app,seem each brand developed their own version of office clone
Btw the name quatro pro sounds more like audio car model than computer program  :biggrin:
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007


sinsi

I've been playing around with threads and had a crazy idea...multithreaded hex$  :biggrin:

Create 4 suspended threads. Each thread is passed the offset of a byte ([number+i])and a word ([result+j]).
Fill each thread's byte, start the timer and the threads and have them convert it to two ascii characters.
Wait until all threads finish, get the elapsed time.

Faster?

zedd151

Quote from: sinsi on April 19, 2024, 12:02:37 PM...multithreaded hex$  :biggrin:
You would have to include the overhead for the thread creation, etc. in the times... to be fair to the other algos == not from thread(s) start to thread(s) end only.  Are you already experimenting with it?
...  Yes, *should* be faster.

sinsi

You could create the threads and leave them suspended, just wake them up to use them then back to sleep.

Quote from: sudoku on April 19, 2024, 12:19:41 PMAre you already experimenting with it?
I'm waiting for someone to tell me "you're crazy" or "hmmm, interesting".
It's a silly idea as far as using it for something so trivial, but there aren't that many practical uses for threads.