Counting cycles isn't an easy thing to do. We've a fundamental problem with the usual macros. As a normal application, we are running in ring 3. This means: There is no direct hardware access for us.
We know little about task switches, micro-operations, cache misses or branch mispredictions. In principle, we could determine this, but to do so requires write access to certain control registers. That' s
not possible in ring 3.
What can we do? Under Linux we could use a kernel module for this. This would guarantee an exclusive access to the CPU. Under BSD or Windows a driver would have to be used for this. Only then would
the measured values be reliable and meaningful. This path will be tedious and cost a lot of time.
But there's another option. Under plain DOS there are no task switches. Exclusive access to the CPU is moreover guaranteed. I wrote a small program that counts the cycles for a short code sequence. This
is repeated 20 times to prevent cache warm-up effects. All 20 results are printed at the end. That serves only for the information, from where the values stabilize. Of course, the median, arithmetic mean
or the variance could be calculated. But that would only be a statistical ironing procedure without any factual background.
The application is written with PowerBASIC and JWASM. This is the first step. I'm working on a version that's completely written in assembly language. A short sequence of FPU instructions is tested: load
double, 4 floating point divisions, save double. If the program is tested, it can't do any harm. Here is the output under DOSBox 0.74-3:
Sorry!
The usage of the Time Stamp Counter isn't possible
with the available CPU.
Program ends now.
This is correct, because DOSBox only emulates an 80486. The Time Stamp Counter came with the Pentium. Here is the output under FreeDOS running under VirtualBox:
Iteration 1: 104 Cycles
Iteration 2: 106 Cycles
Iteration 3: 104 Cycles
Iteration 4: 106 Cycles
Iteration 5: 104 Cycles
Iteration 6: 108 Cycles
Iteration 7: 100 Cycles
Iteration 8: 102 Cycles
Iteration 9: 98 Cycles
Iteration 10: 108 Cycles
Iteration 11: 104 Cycles
Iteration 12: 106 Cycles
Iteration 13: 100 Cycles
Iteration 14: 108 Cycles
Iteration 15: 104 Cycles
Iteration 16: 108 Cycles
Iteration 17: 102 Cycles
Iteration 18: 102 Cycles
Iteration 19: 100 Cycles
Iteration 20: 106 Cycles
Please, press any key to end the application...
The same machine, application started under plain FreeDOS without any drivers:
Iteration 1: 82 Cycles
Iteration 2: 82 Cycles
Iteration 3: 84 Cycles
Iteration 4: 84 Cycles
Iteration 5: 86 Cycles
Iteration 6: 82 Cycles
Iteration 7: 82 Cycles
Iteration 8: 82 Cycles
Iteration 9: 84 Cycles
Iteration 10: 82 Cycles
Iteration 11: 82 Cycles
Iteration 12: 82 Cycles
Iteration 13: 82 Cycles
Iteration 14: 82 Cycles
Iteration 15: 82 Cycles
Iteration 16: 82 Cycles
Iteration 17: 82 Cycles
Iteration 18: 82 Cycles
Iteration 19: 82 Cycles
Iteration 20: 82 Cycles
Please, press any key to end the application...
Where does the difference came from? Why is the program slower under VirtualBox? Well, as mentioned at the beginning: We are an application in ring 3 and are additionally emulated.
Nevertheless, I would be happy about test runs and reports in other environments.