News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

rdtsc

Started by jj2007, January 23, 2023, 07:30:56 PM

Previous topic - Next topic

NoCforMe

Speaking as a definite non-expert here, I would say that the answer to the question "Is xxxx good enough for my timing needs?" would depend on just what those timing needs are, wouldn't you say? Depending on whether you need absolutely accurate, reproducible timings on a small section of critical code, or whether you just need a ballpark look at how fast or slow a routine you're working on is, and you don't need accuracy down to the microsecond. Amiright?
Assembly language programming should be fun. That's why I do it.

NoCforMe

Quote from: hutch-- on January 24, 2023, 12:05:06 PM
I also have a code mountain in front of me bigger than Mount Everest.

OK, you're excused. For now ...
Assembly language programming should be fun. That's why I do it.

jj2007

  include \masm32\MasmBasic\MasmBasic.inc
  Init
  CyCtInit
  CyCtStart
fldpi
fmul FP8(100.0)
fdiv FP4(10.0)
fstp st
  CyCtEnd PI*100/10 ; describe what the code does
  EndOfCode


Results for repeated runs:
+17      Cycles for PI*100/10
+17      Cycles for PI*100/10
+17      Cycles for PI*100/10
+18      Cycles for PI*100/10
+15      Cycles for PI*100/10
+18      Cycles for PI*100/10
+18      Cycles for PI*100/10
+16      Cycles for PI*100/10
+18      Cycles for PI*100/10
+17      Cycles for PI*100/10
+17      Cycles for PI*100/10
+18      Cycles for PI*100/10
+17      Cycles for PI*100/10
+17      Cycles for PI*100/10
+17      Cycles for PI*100/10
+17      Cycles for PI*100/10


There is a lot statistical analysis, outlier elimination etc under the hood, and yet, I've seen runs on other people's machines that showed negative values. Timing is, ehm, challenging. The Lab is full of heroic attempts to get higher precision.

For my daily needs, I use NanoTimer() alias QueryPerformanceCounter. It's not really useful for very short code like the one above, but it is easy to use and much more precise than GetTickCount().

NoCforMe

Another related question: Since it's hard to get accurate timings because we're not in total control of the CPU on account of preemptive multitasking, would it be possible to get more CPU cycles to ourselves, basically hogging the CPU, in order to get a better count? Could this be done by, say, changing our process's priority level under Windows?
Assembly language programming should be fun. That's why I do it.

hutch--

David,

You normally do that when you time something. Have nothing else running, up the priority then run the timing on the algo. I have a toy that times long duration tasks (minutes) that I use for mass video processing but I have never seen a decent timing technique for very short duration tasks.

Agner Fog designed a technique that has to be done at boot without loading the OS but it did not represent how an algo performed running from ring 3 when the OS has ring 0 priority.

HSE

Quote from: hutch-- on January 24, 2023, 05:40:25 PM
Agner Fog designed a technique that has to be done at boot without loading the OS

That is exactly what I'm making with BenchOS

Quote from: jj2007 on January 24, 2023, 12:35:34 PM
     
fldpi
fmul FP8(100.0)
fdiv FP4(10.0)
fstp st
 


Look like that code, wich obtain nothing, it's not even executed in this machine. The measured cycles look like cycles that opcodes takes to pass the pipeline   :rolleyes:
Equations in Assembly: SmplMath

NoCforMe

Just for fun I went ahead and added timestamp logging to my LogBuddy debugging aid. Might be useful to someone ...
Assembly language programming should be fun. That's why I do it.