The MASM Forum

General => The Laboratory => Topic started by: jj2007 on August 27, 2015, 11:44:41 PM

Title: Weird speed differences for seemingly identical code
Post by: jj2007 on August 27, 2015, 11:44:41 PM
Hi,

This archive (http://www.webalice.it/jj2006/pics/SwitchBenchmark.zip) contains two versions of roughly the same code:
- fbsl.exe BenchJJ.fbs (i.e. drag the fbs over the exe)
- BenchJJ.exe

They both have a Switch/Case loop whose disassembly is in the two *.txt files.
Now the odd thing is that one takes 1,000 ms, the other only about 700 ms on my machine; despite their identical look and identical (mis-)alignment.

Before jumping to theories and conclusions, I'd like to see a few timings. The code is not mine, but I am very confident that it's free of malware (more (http://forum.basicprogramming.org/index.php/topic,5013.msg31443.html#msg31443)).

Relevant excerpt:
    Invoke GetTickCount
    mov ebx, eax // get initial ticks
    .Repeat
      inc swVar
      .If swVar > 100
        xor swVar, swVar
      .EndIf
     
      .If swVar = 0
        inc ct0
        jmp @F
      .EndIf
      .If swVar = 1
        inc ct1
        jmp @F
      .EndIf
      .If swVar = 2
        inc ct2
        jmp @F
      .EndIf
      .If swVar = 4
        inc ct4
        jmp @F
      .EndIf
      inc ctDef
      @@
     
      inc loopCt
      ; int 3
    .Until loopCt > 200000000
   
    Invoke GetTickCount // get current ticks
    sub eax, ebx // calc tick delta
Title: Re: Weird speed differences for seemingly identical code
Post by: satpro on August 28, 2015, 01:38:30 AM
Off-topic, but I would like to thank you for providing the link to the BASIC language programming dev forum, basicprogramming.org.  Exactly what I have been looking for with my own project and almost can't believe I haven't at least stumbled over it before now.   :t
Title: Re: Weird speed differences for seemingly identical code
Post by: dedndave on August 28, 2015, 02:10:57 AM
i see a couple hundred mS difference

fbsl with no firefox running ~950 mS
fbsl with firefox running ~1200 mS

benchjj with no firefox running ~780 mS
benchjj with firefox running ~950 mS

seems a little screwy - lol
the fbsl program does not allow copy/paste of text
maybe you could write a more convenient test bed
probably why you're not seeing more replies
Title: Re: Weird speed differences for seemingly identical code
Post by: rrr314159 on August 28, 2015, 03:19:56 AM
On my slow AMD, they take 2200 vs. 1700 ms.

I see speed differences often that are hard to pin down; very annoying when one is coding for speed in particular. For instance after the timing loops, I print out various arrays. It turned out that changing the printouts (because I'm trying to pin down speed diffs) changes the speeds! It's partly cured by throwing in lots of align statements (especially inner loops, naturally). Obviously one should use a real-time OS, maybe simple DOS, for critical timings, but too much trouble (don't have a lot of time for programming these weeks)
Title: Re: Weird speed differences for seemingly identical code
Post by: jj2007 on August 28, 2015, 04:15:34 AM
Thanks, folks.

It looks as if Mike had the right idea: instruction cache boundary problems (http://forum.basicprogramming.org/index.php/topic,5013.msg31445.html#msg31445).
In fact, if I insert in line 28 of BenchJJ.fbs (after mov edi, 5) nineteen invoke GetTickCount plus 2 nops, then the boundaries are shifted down enough to show a dramatic drop from 1,000 to 604 milliseconds. Umph 8)
Title: Re: Weird speed differences for seemingly identical code
Post by: rrr314159 on August 28, 2015, 04:43:48 AM
I bet if you now modify the code - even after the affected area - you'll have to re-align (usually). And, on another machine - even a similar one - it will be different. Is there a systematic way to handle this? For instance, a system call to tell what type of instruction cache one has, and where the boundaries are?

Another q., do the best C++ compilers (Visual C++ I suppose) handle this problem? I don't hear such complaints from them
Title: Re: Weird speed differences for seemingly identical code
Post by: jj2007 on August 28, 2015, 04:46:42 AM
Good question. So far we never ran into this problem, maybe because our proggies are small - in contrast, the fbsl.exe is half a megabyte, so it is easy to fill a 32k or 64k instruction cache. Still, the exact mechanism is not very clear to me. But obviously, one nop more can make a really big difference.