Hi,
This archive (http://www.webalice.it/jj2006/pics/SwitchBenchmark.zip) contains two versions of roughly the same code:
- fbsl.exe BenchJJ.fbs (i.e. drag the fbs over the exe)
- BenchJJ.exe
They both have a Switch/Case loop whose disassembly is in the two *.txt files.
Now the odd thing is that one takes 1,000 ms, the other only about 700 ms on my machine; despite their identical look and identical (mis-)alignment.
Before jumping to theories and conclusions, I'd like to see a few timings. The code is not mine, but I am very confident that it's free of malware (more (http://forum.basicprogramming.org/index.php/topic,5013.msg31443.html#msg31443)).
Relevant excerpt:
Invoke GetTickCount
mov ebx, eax // get initial ticks
.Repeat
inc swVar
.If swVar > 100
xor swVar, swVar
.EndIf
.If swVar = 0
inc ct0
jmp @F
.EndIf
.If swVar = 1
inc ct1
jmp @F
.EndIf
.If swVar = 2
inc ct2
jmp @F
.EndIf
.If swVar = 4
inc ct4
jmp @F
.EndIf
inc ctDef
@@
inc loopCt
; int 3
.Until loopCt > 200000000
Invoke GetTickCount // get current ticks
sub eax, ebx // calc tick delta
Off-topic, but I would like to thank you for providing the link to the BASIC language programming dev forum, basicprogramming.org. Exactly what I have been looking for with my own project and almost can't believe I haven't at least stumbled over it before now. :t
i see a couple hundred mS difference
fbsl with no firefox running ~950 mS
fbsl with firefox running ~1200 mS
benchjj with no firefox running ~780 mS
benchjj with firefox running ~950 mS
seems a little screwy - lol
the fbsl program does not allow copy/paste of text
maybe you could write a more convenient test bed
probably why you're not seeing more replies
On my slow AMD, they take 2200 vs. 1700 ms.
I see speed differences often that are hard to pin down; very annoying when one is coding for speed in particular. For instance after the timing loops, I print out various arrays. It turned out that changing the printouts (because I'm trying to pin down speed diffs) changes the speeds! It's partly cured by throwing in lots of align statements (especially inner loops, naturally). Obviously one should use a real-time OS, maybe simple DOS, for critical timings, but too much trouble (don't have a lot of time for programming these weeks)
Thanks, folks.
It looks as if Mike had the right idea: instruction cache boundary problems (http://forum.basicprogramming.org/index.php/topic,5013.msg31445.html#msg31445).
In fact, if I insert in line 28 of BenchJJ.fbs (after mov edi, 5) nineteen invoke GetTickCount plus 2 nops, then the boundaries are shifted down enough to show a dramatic drop from 1,000 to 604 milliseconds. Umph 8)
I bet if you now modify the code - even after the affected area - you'll have to re-align (usually). And, on another machine - even a similar one - it will be different. Is there a systematic way to handle this? For instance, a system call to tell what type of instruction cache one has, and where the boundaries are?
Another q., do the best C++ compilers (Visual C++ I suppose) handle this problem? I don't hear such complaints from them
Good question. So far we never ran into this problem, maybe because our proggies are small - in contrast, the fbsl.exe is half a megabyte, so it is easy to fill a 32k or 64k instruction cache. Still, the exact mechanism is not very clear to me. But obviously, one nop more can make a really big difference.