64-bit vs 32-bit benchmarks

jj2007 · May 21, 2016, 09:45:01 PM

When testing new versions of assemblers of the Watcom family, especially HJWasm, it seemed that the improvement of 64-bit code over 32-bit code is, ehm, negative. That inspired me to check how the situation was when the computing world jumped from 16-bit to 32-bit code. So I googled around and noticed that it is remarkably difficult to find valid comparisons.

Below is what I found so far, which indicates that 32-bit code is roughly 5-10 times faster than equivalent 16-bit code on identical hardware. Given that the addressable memory jumped from ridiculous 64 kBytes to generous 2 gigabytes, there was certainly an incentive to go for 32-bit code 8)

Dhrystone Benchmark 32/16 ratio=7.7
Core i7 4820K $$$1 14136 32 Bit Results Dhry1 optvaxmips
Core i7 4820K 3700 1832 16 Bit Results Dhry1 optvaxmips

Operating system and compiler influences on benchmarking
OS: DOS 16bit DOS 32bit 32/16
Compiler: Watcom-10.5 Watcom-10.5 ratio
----------------------------------------
NUMERIC SORT 3.792 30.723 8.1
STRING SORT 0.463 2.035 4.4
BITFIELD 409105.0 5030838.0 12.3
FP EMULATI 0.837 2.096 2.5
FOURIER 451.840 499.653 1.1
ASSIGNMENT 0.036 0.193 5.4
IDEA 23.224 51.922 2.2
HUFFMAN 7.499 17.083 2.3
NEURAL NET 0.030 0.250 8.3
LU DECOMPOS 0.764 8.452 11.1
--------------------------------------
MEM INDEX 0.025 0.169 6.8
INT INDEX 0.067 0.215 3.2
FP INDEX 0.055 0.258 4.7
--------------------------------------

rrr314159 · May 22, 2016, 12:07:56 AM

jj2007: improvement of 64-bit code over 32-bit code is, ehm, negative.

- That's a remarkably polite way to put it!

My experience comparing the two was pretty extensive (for a newbie). Speed-wise, the difference was minor in cases I dealt with (a small subset of all cases that exist, of course). For normal programming 32-bit was a little faster, I think due to the reduced code size (better for cache of course). But I'm particularly interested in "complicated" CPU-intensive algos which benefit from those extra registers. In those cases I got a few percent improvement from 64-bit. Undoubtedly one could find an algo which really needs extra registers, which would improve significantly more, but it would be a rare case; I know of none. However the enhanced ease of programming is a big plus.

I also worked on algos which benefit from the register's extra width - primarily large primes (which I never posted, there seemed no interest). At the 10^18 range I was getting significant improvement, 20% or so. However there's a counterexample. Chris Wallich, using C++ and working in this area far more than I, said the improvement was only a few percent. I don't know why. Anyway I think that (as with the extra registers) the right algo will benefit significantly from that extra width.

Finally if you're using the large address space a lot, surely 64-bit is going to be faster; I have almost no experience with that.

Apart from such specialized work I concluded 32-bit was a little faster. However in another post you reported (iirc) a much greater speed hit from 64. I would tentatively guess that's an artifact and if you put more work into it you'd get 64-bit to perform almost as well as 32-bit. Of course the obvious question is, why should you bother?

About code size: if you don't use LARGEADDRESSAWARE 64-bit is only 10 or 20 percent larger. There are negatives; qWord doesn't like it; but it works fine on my personal machines. I think it might contribute, however, to porting difficulties.

Overall 64-bit is something of a disaster, I'm afraid; but it's MicroSoft's fault not Intel's. 64-bit, in my limited experience, is far less compatible because (IMHO) Microsoft dropped the ball with their poor ABI and poor support for us poor assembler programmers! But for C++ (and, I suppose, other HLL's) it seems 64-bit is "a good thing" (as Martha Stewart used to say) - because MS supports it thoroughly, while effectively deprecating 32-bit.

K_F · May 22, 2016, 05:37:07 AM

The 64/32 bit comparisons have been known since the https://en.wikipedia.org/wiki/Intel_i860 days.
It's just been how they implement the data/instruction busses.

TWell · May 22, 2016, 04:52:37 PM

In my tests bigger problem was crt/crtmt libraries.

6.352s hjwasm322005DDK.exe libc.lib
18.498s hjwasm322005DDKmt.exe libcmt.lib
7.100s hjwasm642005DDK.exe libc.lib
14.751s hjwasm642005DDKmt.exe libcmt.lib

I can't figure out how to release handbrake in mt library ;)

The MASM Forum

News:

64-bit vs 32-bit benchmarks

jj2007

rrr314159

K_F

TWell