The MASM Forum

General => The Laboratory => Topic started by: jj2007 on November 30, 2017, 10:48:25 PM

Title: QueryPerformanceCounter: Sleep slows down my code
Post by: jj2007 on November 30, 2017, 10:48:25 PM
A simple loop using NanoTimer (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1171) aka QueryPerformanceCounter:  For_ ecx=0 To loops-1
push ecx
  invoke Sleep, 0    ; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  pop ecx
NanoTimer() ; start the QueryPerformanceCounter timer
void GetHash("Just a little test of the hash function")
add total, NanoTimer(us) ; add time for calculating the average, end QueryPerformanceCounter
mov ms(ecx), eax ; mov time into array
  Next


One would expect that most timings are near the physical limit, but a few are much higher because of interrupts. Here is the distribution:
(http://www.webalice.it/jj2006/PlotHashSleep0.png)

What I don't understand is the result for invoke Sleep, 1:
(http://www.webalice.it/jj2006/PlotHashSleep1.png)

Suddenly, timings increase drastically, with a few exceptions to the left. We always used Sleep to improve the accuracy of timings, but here it seems to have the opposite effect...

Executables and source attached.
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: Siekmanski on December 01, 2017, 02:50:53 AM
Timers in windows are really a pain in the ass..
Especially when you need to poll a message at a fixed interval with a timer interrupt within 1 millisecond.  :(


Sleep,0
(http://members.home.nl/siekmanski/Sleep0.png)

Sleep,1
(http://members.home.nl/siekmanski/Sleep1.png)
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: aw27 on December 01, 2017, 05:55:48 AM
Probably you should not consider the first measurements, they will take longer because there is nothing in the cache.
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: jj2007 on December 01, 2017, 06:12:22 AM
Quote from: aw27 on December 01, 2017, 05:55:48 AM
Probably you should not consider the first measurements, they will take longer because there is nothing in the cache.

Interesting idea: ; ArraySort ms() ; no sorting, show timings in their original sequence

Doesn't solve the Sleep(0) vs Sleep(1) mystery, but you are absolutely right about the first timings being much higher :t
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: aw27 on December 01, 2017, 08:54:33 AM
Quote
Doesn't solve the Sleep(0) vs Sleep(1) mystery,
I think it solves. The longer the thread sleeps the more chances has the same CPU thread of being reused elsewhere, either in the same process or in another process.  This means that the cache will be invalidated shortly after that.
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: jj2007 on December 01, 2017, 10:52:25 AM
Good point, José :t
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: hutch-- on December 01, 2017, 11:52:30 AM
You know roughly what SleepEx does, it suspends the thread, waits a duration then returns in the next time slice. It will garrantee slow down your code which is close to what its designed to do.
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: jj2007 on December 01, 2017, 12:17:25 PM
Quote from: hutch-- on December 01, 2017, 11:52:30 AM
You know roughly what SleepEx does, it suspends the thread, waits a duration then returns in the next time slice. It will garrantee slow down your code which is close to what its designed to do.

The timing starts after the return from Sleep. Check José's argument, it's more convincing.
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: hutch-- on December 01, 2017, 11:33:42 PM
What Jose said is fine, what I gather is you are running timings and in that context I would imagine the you would want to avoid cache effects. Make the SleepEx duration longer so that you DO lose any cache effects and the accuracy of the timing should improve. You are still at the mercy of ring0 interference and there is likely to be some degree of wandering in the next time slice available after SleepEx gives control back to the timing thread. This is why I avoid short intense timing methods, there are too many variables that are either difficult or impossible to control.
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: aw27 on December 01, 2017, 11:48:50 PM
You can also increase the Process Priority and the Thread Priority (The normal Thread Priority is the Process Priority). No thread is interrupted by a lower priority thread and a higher priority interrupts lower priority. This may have side effects in the Sleep behaviour, though.  :icon_rolleyes:
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: jj2007 on December 02, 2017, 12:50:30 AM
Quote from: hutch-- on December 01, 2017, 11:33:42 PMMake the SleepEx duration longer so that you DO lose any cache effects and the accuracy of the timing should improve.

That is what I would expect, too. But the evidence tells a different story - this is Sleep(100) before the timing starts. Much worse than before, and a number of "negative" outliers. And the median is about 5x higher than it should be.

Compare that to the lower graph: Sleep(0), realtimepriority, unsorted = sequential. It's the cache and nothing else. Which opens the question whether we want to time algos with or without cache ::)
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: Siekmanski on December 02, 2017, 02:15:55 AM
Quote from: aw27 on December 01, 2017, 11:48:50 PM
You can also increase the Process Priority and the Thread Priority (The normal Thread Priority is the Process Priority). No thread is interrupted by a lower priority thread and a higher priority interrupts lower priority. This may have side effects in the Sleep behaviour, though.  :icon_rolleyes:

It works very well.

I have used "Sleep,1" in a multimedia interrupt timer thread with THREAD_PRIORITY_TIME_CRITICAL.
The timer is running 200 nanoseconds slower as the interrupt time I want.
When it slowly drifts below the interrupt time, I pause it with "Sleep,1" and reset the timer.
It stays "stable" within 1 millisecond and it keeps the processor load very small.

I needed this because I had to synchronize with the per 10 millisecond write/play cursor message from DirectSound which is jittery as hell and would not spend another 10 milliseconds waiting for the next message and lose track of the buffer positions. Now I have the lowest latency possible with DirectSound streaming buffers.
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: aw27 on December 02, 2017, 04:00:33 AM
Quote from: Siekmanski on December 02, 2017, 02:15:55 AM
I have used "Sleep,1" in a multimedia interrupt timer thread with THREAD_PRIORITY_TIME_CRITICAL.
For very short runs it may work very well as you confirmed.  :t
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: hutch-- on December 02, 2017, 08:59:45 AM
Something else that helps to stabilise the timings is to use a serialised instruction before it, CPUID is one that will do that, clear the cache before the timing. As you would be aware, modern x86/64 hardware is a lot messier when it comes to timing code. This Haswell E/EP I am using sleeps in "noddy mode" at about 1.2 gig until you load it where it comes up close to the 3.3 gig it tops out at.

Power saving, variable clock rates, out of order instruction sequencing etc etc etc .... are all designed to make your life more miserable so I personally test with long duration test pieces to try and get some idea if an algo is faster than another. To test attack (how quick and algo is started and run) you make a short test and run it many times so you get a duration of a second or so. For long linear tests I use the massive amount of memory available in win64 to test linear speed so if you have to process massive log files or large videos you can clock how fast a long single read/write takes.

Just a note with SleepEx, the granularity is about 20ms so SleepEx,1,0 is no faster than SleepEx,20,0. From memory it is a hardware based interrupt.
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: jj2007 on December 02, 2017, 01:47:05 PM
Quote from: hutch-- on December 02, 2017, 08:59:45 AMthe granularity is about 20ms so SleepEx,1,0 is no faster than SleepEx,20,0

Right. Under the hood, it calls NtDelayExecution, exactly like Sleep(delay), which saves two bytes compared to the identical SleepEx, delay, 0:SleepEx         Ú$  8BFF              mov edi, edi                            ; STATUS_X kernel32.SleepEx(Time,Alertable)
75811217        ³.  55                push ebp
75811218        ³.  8BEC              mov ebp, esp
7581121A        ³.  5D                pop ebp
7581121B        À. EB ED             jmp short <jmp.&API-MS-Win-Core-Synch-L ; Jump to KERNELBASE.SleepEx
... ->NtDelayExecution


Guess what are the values of edi, ebp and esp after the first four lines have been executed - C compilers at work :P

Below the distributions (serial, not sorted) for:
- Sleep(100), no CPUID
- Sleep(100), with CPUID
- Sleep(0), no CPUID
- Sleep(100), with CPUID

Note the average of 780 ms for the upper 2 graphs: Remember the Sleep(100) comes before the timings starts.
In contrast, the lower 2 Sleep(0) runs (with 500 elements) show spikes at regular intervals; but even those, at about 200 microsecs, are far below the 780ms average of the Sleep(100) runs.
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: nidud on December 02, 2017, 02:12:58 PM
deleted
Title: Best strategy for timings
Post by: jj2007 on December 02, 2017, 02:38:45 PM
Finally, a graph showing the results for the optimal timing strategy:
- discard first x timings to load the cache
- sort the results
- eliminate the highest 20%, i.e. the spikes

Note that the average is calculated before the top 20% got eliminated. The real average is 101.5 microseconds.

@nidud: yes, of course, mov edi, edi is the hot-patch instruction that you find at the entry of most API functions. But the rest is funny ;)
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: Siekmanski on December 02, 2017, 09:29:29 PM
From the ntdll.dll these functions are interesting,

ZwQueryTimerResolution
ZwSetTimerResolution
ZwDelayExecution

Maybe we can create our own Sleep function with a 100 nanoseconds granularity and get below 1 millisecond?

more info: http://www.google.nl/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&uact=8&ved=0ahUKEwj0ioejiuvXAhUSpKQKHXRPBaUQFgg-MAM&url=http%3A%2F%2Fread.pudn.com%2Fdownloads137%2Fdoc%2F584096%2F%25E6%259C%25AA%25E6%2596%2587%25E6%25A1%25A3%25E5%258C%2596%25E5%2587%25BD%25E6%2595%25B0%2F10%2520Time.pdf&usg=AOvVaw2OY09pfo_iU8iIPP8gE0vM
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: jj2007 on December 02, 2017, 11:33:47 PM
Quote from: Siekmanski on December 02, 2017, 09:29:29 PM
From the ntdll.dll these functions are interesting,

ZwQueryTimerResolution
ZwSetTimerResolution
ZwDelayExecution

These are for drivers; the equivalent Ntxx functions do the same.

QuoteMaybe we can create our own Sleep function with a 100 nanoseconds granularity and get below 1 millisecond?

In theory, yes, but it would increase power consumption.

I have another little project: a Delay2AbsoluteTime macro. Here are first results:

13:29:42.066  - end should be two seconds later, at hh:mm:??.123 millisecs
13:29:44.123    2057 ms passed

13:29:44.273  - end should be two seconds later, at hh:mm:??.123 millisecs
13:29:46.123    1850 ms passed

13:29:46.146  - end should be two seconds later, at hh:mm:??.123 millisecs
13:29:48.123    1977 ms passed

13:29:48.168  - end should be two seconds later, at hh:mm:??.123 millisecs
13:29:50.123    1955 ms passed

13:29:50.411  - end should be two seconds later, at hh:mm:??.123 millisecs
13:29:52.123    1712 ms passed

13:29:52.341  - end should be two seconds later, at hh:mm:??.123 millisecs
13:29:54.123    1782 ms passed

13:29:54.396  - end should be two seconds later, at hh:mm:??.123 millisecs
13:29:56.123    1727 ms passed

13:29:56.196  - end should be two seconds later, at hh:mm:??.123 millisecs
13:29:58.123    1927 ms passed

13:29:58.385  - end should be two seconds later, at hh:mm:??.123 millisecs
13:29:58.123    1 ms passed

13:29:58.393  - end should be two seconds later, at hh:mm:??.123 millisecs
13:29:58.123    1 ms passed


Interestingly enough, the granularity is not the usual 16 or so millisecs (1/64) of Sleep & friends but rather 1 ms.

P.S. - results from powercfg -energy duration 5 (google translated), it seems that Pale Moon is also greedy:Platform timer resolution: Platform timer resolution
The default platform timer resolution is 15.6 ms (15625000 ns) and must be used whenever the system is idle. If the timer resolution increases, the processor's energy-saving technologies may not be effective. The timer resolution may increase in the case of multimedia playback or graphic animations.
Current timer resolution (100 ns unit) 10000
Max timer interval (100 ns unit) 156001

Platform timer resolution: Pending timer request
A program or service has requested a resolution of the timer that is less than the maximum resolution of the platform timer.
Requested range 10000
Request process ID 6028
Request process location \ Device \ HarddiskVolume3 \ Program Files (x86) \ Pale Moon \ palemoon.exe


So my system is running at 1 ms resolution if the browser is open :(
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: Siekmanski on December 03, 2017, 01:24:40 AM
Quote from: jj2007 on December 02, 2017, 11:33:47 PM
Quote from: Siekmanski on December 02, 2017, 09:29:29 PM
From the ntdll.dll these functions are interesting,

ZwQueryTimerResolution
ZwSetTimerResolution
ZwDelayExecution

These are for drivers; the equivalent Ntxx functions do the same.

Forgot to mention that...

Did a quick test with NtQueryTimerResolution:

CurrentResolution =  10003
MaximumResolution =   5000
MinimumResolution = 156250

Seems we can have a "Sleep" function of 0.5 milliseconds.
I'll have to check first if it is useful or not in a timer interrupt, and how it behaves on different computers. ( cpu load etc. )

QuoteI have another little project: a Delay2AbsoluteTime macro.

Timing in windows is a real challenge.
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: jj2007 on December 03, 2017, 01:31:32 AM
> NtQueryTimerResolution
Nice - I used powercfg -energy duration 5 which is much clumsier :t

Further analysis shows that my browser Pale Moon is indeed guilty of setting the resolution to 1ms, thus causing increased CPU load and power consumption, BUT: it doesn't so automatically. If I open the browser with a static page, granularity is as usual 1/64; only if I visit Google's baby called "YouTube", the resolution suddenly jumps up, and granularity goes to 1ms. Good to know :bgrin:

P.S.: Can I get some info, please?This is Windows version 6.1, build 7601
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

initial res:    100000

current res
esi             0
edi             5000

current res
esi             10000
edi             10000

current res
esi             12500
edi             12500

current res
esi             25000
edi             25000

current res
esi             50000
edi             50000

current res
esi             100000
edi             100000

This is Windows version 5.1, build 2600
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

initial res:    100144

current res
esi             0
edi             10032


Weird results for Win10, it seems to be stuck at 10000:This is Windows version 10.0, build 15063
Intel(R) Celeron(R) CPU  N2840  @ 2.16GHz

initial res:    10014

current res
esi             0
edi             5007

lowest, highes, current res
eax             5000
edx             5007
ecx             156250

current res
esi             10000
edi             10014

lowest, highes, current res
eax             5000
edx             10014
ecx             156250

lowest, highes, current res
eax             5000
edx             10014
ecx             156250
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: Siekmanski on December 03, 2017, 07:37:14 AM
I've implemented the 0.5 ms "Sleep" routine in my timer interrupt.
It works perfect.
Now I can keep track with the write/play position message with more accuracy and precision. ( within 4 samples at a samplerate of 44100Hz )
Without losing a message or producing clicks in the sound, I'm happy !  :biggrin:
And the cpu load is very small 0.0 - 0.2 % on my computer.
Always thought that it was impossible without a big cpu load.

Jochen, thanks for the inspiring thread.

(http://members.home.nl/siekmanski/Dsound%20timer.png)
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: jj2007 on December 03, 2017, 09:09:21 AM
Quote from: Siekmanski on December 03, 2017, 07:37:14 AMAnd the cpu load is very small 0.0 - 0.2 % on my computer.

For the program, or the whole computer? The setting is global...

QuoteJochen, thanks for the inspiring thread.

Still struggling to understand everything. For my Win7-64 machine, it is pretty clear, but the XP VM and the Win10 notebook show an entirely different behaviour for NtQueryTimerResolution :(
Title: Re: QueryPerformanceCounter: Sleep slows down my code
Post by: Siekmanski on December 03, 2017, 10:05:41 AM
For the program only 0.1 % ( see task manager "AudioDSwav.exe" in the top right of the posted image )

I have to test it on other computers, and see how it behaves.