Author Topic: A better benchmark approach ?  (Read 1430 times)

guga

  • Member
  • *****
  • Posts: 1074
  • Assembly is a state of art.
    • RosAsm
A better benchmark approach ?
« on: July 14, 2014, 06:32:45 PM »
Hi guys

after reading the article at: Case Study: Windows NT SharedUserData regarding the inner members of the structure KUSER_SHARED_DATA (which i´m trying to properly document it for the different OSes), i was thinking if the technique displayed on the article couldn´t be used by us to improve the benchmarking apps we use at the forum.

Since all versions of windows since NT, has the 1st members of the structure KUSER_SHARED_DATA with the same values, we can easily make a routine to acess it, gatter the nedded data to create a more reliable benchmark app.

For instance, to retrieve the timming infos something like this can be done:

Code: [Select]

; Simple equates representing the pos of each member of the structure KUSER_SHARED_DATA (for windows NT). On this example, the OS don´t matter because, all we need is the 1st 8 members of the structure
; whcih are unchanged in all others OSes. So, perhaps all we need is TickCountLow to SystemTime.High2Time (TimeZone seems useless for our purpose)

[KUSER_SHARED_DATA_NT.TickCountLowDis 0
 KUSER_SHARED_DATA_NT.TickCountMultiplierDis 4
 KUSER_SHARED_DATA_NT.InterruptTime.LowPartDis 8
 KUSER_SHARED_DATA_NT.InterruptTime.High1TimeDis 12
 KUSER_SHARED_DATA_NT.InterruptTime.High2TimeDis 16
 KUSER_SHARED_DATA_NT.SystemTime.LowPartDis 20
 KUSER_SHARED_DATA_NT.SystemTime.High1TimeDis 24
 KUSER_SHARED_DATA_NT.SystemTime.High2TimeDis 28
 KUSER_SHARED_DATA_NT.TimeZoneBias.LowPartDis 32
 KUSER_SHARED_DATA_NT.TimeZoneBias.High1TimeDis 36
 KUSER_SHARED_DATA_NT.TimeZoneBias.High2TimeDis 40
 KUSER_SHARED_DATA_NT.ImageNumberLowDis 44
 KUSER_SHARED_DATA_NT.ImageNumberHighDis 46
 KUSER_SHARED_DATA_NT.NtSystemRootDis 48
 KUSER_SHARED_DATA_NT.DosDeviceMapDis 568
 KUSER_SHARED_DATA_NT.CryptoExponentDis 572
 KUSER_SHARED_DATA_NT.TimeZoneIdDis 576
 KUSER_SHARED_DATA_NT.DosDeviceDriveTypeDis 580
 KUSER_SHARED_DATA_NT.NtProductTypeDis 612
 KUSER_SHARED_DATA_NT.Padding0Dis 613
 KUSER_SHARED_DATA_NT.ProductTypeIsValidDis 616
 KUSER_SHARED_DATA_NT.NtMajorVersionDis 620
 KUSER_SHARED_DATA_NT.NtMinorVersionDis 624
 KUSER_SHARED_DATA_NT.ProcessorFeaturesDis 628]

[Size_Of_KUSER_SHARED_DATA_NT 692]

Get_KernelTimmings:
     mov esi &MM_SHARED_USER_DATA_VA; (This equate has the value: 07FFE0000)
     mov eax D$esi+KUSER_SHARED_DATA_NT.TickCountLowDis
     mov ebx D$esi+KUSER_SHARED_DATA_NT.TickCountMultiplierDis
     mov ecx D$esi+KUSER_SHARED_DATA_NT.InterruptTime.LowPartDis
....
untill we gatter the last KUSER_SHARED_DATA_NT.SystemTime.High2Time member
[code]

And once gattered the necessary info, we can simply compute the difference of them before and after a certain function we intend to benchmark. I didn´t understood correctly how to convert the interrupt time to "our" time (in ms - I mean the time in ms that a function is actually being used after all opcodes are being accessed, like a conversion between cycles to miliseconds), but the idea seems interesting and a bit more accurated then using Windows Apis that may loose some accuracy, because it will take sometime untill they reach the Kernel timming routines.

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7034
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: A better benchmark approach ?
« Reply #1 on: July 14, 2014, 09:00:32 PM »
Guga,

As long as it is on a multi-core processor, the most accurate method I can think of that will run in ring3 is a multi-media timer running in one thread that uses global variables for signalling from another thread so that the other thread does the benchmark, signals start and end which are then timed by the timer thread. It would not be any real use on a single core machine as the threads would both be driven from the same core and both would be slower.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy: