News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

CodeTune Timming Analyzer - V 1.0 (Update 09/01/16)

Started by guga, December 30, 2015, 08:33:13 AM

Previous topic - Next topic

guga


LIBRARY UPDATED !!!
Updated of my benchmark library called "CodeTune". The library was made to help programmers find an easier and better approach while bechmarking their fucntions.

The API usage is simple and converted for masm usage as well. (Originally made for RosAsm i ported it to masm so you guys can use it too)

Working examples in Masm and RosAsm on the attached zip file (CodeTuneV1Light.zip).
Due to size limitation on the Forum, the complete updated API Guide is stored on the link at the end of this post

New version include:

  • CreateTimeProfileEx api that uses all available algo methods and provide the fastest timing of the tested code. To gain more accuracy on your measurements, consider using this Api instead the simple version CreateTimeProfile.
  • STDEx structure that stores the fastest value found on the function above
  • Cleaned the source code
  • Added Masm and RosAsm examples
  • The functions are faster and more stable

TODO:

  • Add Apis to estimate the time that CreateTimeProfile and CreateTimeprofileEx will finish working. (Good, if the user wants to add things like a progressbar running or a timer count or simply to know how much time the Api will finish analysing his code
  • Build converter functions (Nanoseconds to miliseconds, to Gigahertz, to megahertz, to clock cycles etc

  • Drink a beer (or 3), write a book, get married (again), explode the Congress, meet a Alien, visit Epcot Center....and.. buy more beer ! Not necessarily on that order  :icon_mrgreen: :icon_mrgreen: :icon_mrgreen:

Current version Api include:

QuoteI - Functions

1 - Main functionality:

•   CreateTimeProfile
•   CreateTimeProfileEx

2 - Complementary functions:

•   RunTimeDataProc
•   SetupTimeProfiler
•   UserTargetProc

3 - Extras:

•   CpuSettIngs
•   GetCpuFrequencyEx

II – Structures

•   CPUData
•   CT_STANDARD_DEVIATION
•   CT_STDEx
•   CT_Nfo

III – Equates

•   CPU_CPUID_AVALIABLE
•   CPU_RDTSCP_AVALIABLE
•   CPU_RDTSC_AVALIABLE
•   CT_ALGO1
•   CT_ALGO2
•   CT_ALGO3
•   CT_ALGO4
•   CT_ALGO5
•   CT_ALGO6
•   CT_ALGO7
•   CT_ALGO8
•   CT_ALGO_METHOD_ERROR
•   CT_ANALYSIS_ERROR1
•   CT_ANALYSIS_ERROR2
•   CT_ANALYSIS_ERROR3
•   CT_ANALYSIS_ERROR4
•   CT_ANALYSIS_START
•   CT_ANALYSIS_SUCESS
•   CT_BENCHMARK_FINISHED
•   CT_BENCHMARK_RUNNING
•   CT_BENCHMARK_START
•   CT_CALIBRATION_FINISHED
•   CT_CALIBRATION_RUNNING
•   CT_CALIBRATION_START
•   CT_ERROR_BENCHMARK_OVERHEAD
•   CT_ERROR_CALIBRATION_OVERHEAD
•   CT_ERROR_INPUT_VALUE
•   CT_INCONCLUSIVE
•   CT_INSUFFICIENT_FEATURES
•   CT_STATUSCODE_ERROR
•   MAX_ITERATIONS
•   MAX_SAMPLES
•   OVERHEAD_LIMIT

IV – Type Definitions

•   LP_RUNTIMEDATA_CALLBACK_ROUTINE
•   LP_USERTARGET_CALLBACK_ROUTINE



A small tip: To enhance the accuracy of the library it is good if perform an alignment on your code. I don´t know how to make a proper alignment in Masm, but the better approach is align to 16 bytes boundary on the start of each of your functions inside your app (I´m not saying to align the PE sections, but align  your functions to a 16 byte boundary). This will make your app works better and also enhance the accuracy of the library itself.
Although the library Api´s are already aligned, if you align your own apps on that way, you probably will get a better performance (At least on x86 32 bits, dunno how is it on 64 bits)


Complete documentation (guide and library) is stored on the link below (Couldn´t attach here due to size limits)

CodeTune Api Reference

Mirror

CodeTune Api Reference

A Light version containing only the library+examples is available on the link  below attached to this post !
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

new version of CodeTune library ported to masm. Zip file includes the dll, a inc and a lb file already ported to masm.

I hope i suceeded to make the correct translation to masm. I´m currently updating the library and bulding other functions as well. Like a estimation on how much time the algo will take to finish, a full analysis including all algorithms methods at once, converters for nanosecs to other units.

I´m currently looking a way to make a faster version of qsort. I posted earlier looking for one more specifgic, but, couldn´t find yet.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Good news. The next version i´ll implement CreateTimeProfileEx which will be able to analyse the data more deeper. It will take onto account all the available Algo methods and make a similar interpretation as designed in CreateTimeProfile function.

On my preliminary tests i had a gain of accuracy of something around 15% to 20% !!! Today after releasing the library i improved it a little more and gained a bit more of accuracy, but, right now, after finishing the new function CreateTimeProfileEx, the accuracy increased considerably :)

The side-effect of that is time spent to compute. The new function takes 8 to 10 times more to run then the CreateTimeProfile function using all 8 algorithms that are available on my CPU. The new function takes around 8 to 10 seconds to finish analyzing  masmbasic strlen (Which reached the amazing mark of only 3,74917033.... nanoseconds : something around 10,998..... clock cycles), while the other one (CreateTimeProfile) takes only 1 or 1,2 seconds to complete).

The time spent X computations (iterations) on new function are as follow:
a) 140 millions of iterations (loops) in 83,6710 seconds. Reached the mark of 3,37 nanoseconds (10,82 clock cycles) for masm basic strlen
b) 14 millions of iterations (loops) in 10 seconds. Reached the mark of 3,75 nanoseconds (11,01 clock cycles) for masm basic strlen.

So between both there is still a difference of something around 1,75% of clock cycles for the extra 126 millions loops. So, the extra 126 millions loops resulted on a gain of 1,75% of accuracy.. This level of accuracy is good enough because if the results maintain stable i can be able to make a error estimation. So, instead people feed the algo with 3000 samples and 3000 iterations, they can keep using 300 samples x 3000 iterations (or even less) in order to the function works faster and yet, keep the accuracy intact.

I´m currently working on the new function to include on the library. Not sure if i can reduce the time spent, but, i´ll do my best to keep the high accuracy in short time spent.

10 seconds may seems not that much, but, this is the time on a I7. On older processors it may take more time to finish. That´s why i need to optimize what i can without ruining the rest of the functions inside the library


Btw: For those who wants to take a look at the source code but don´t have RosAsm installed, here it is. (Sorry, Rosasm syntax, but it is close to Nasm. So, the syntax shouldn´t be a problem, i hope)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com


guga

It is on the API guide on the topic. The link for the guide is:

http://www.4shared.com/zip/Kr64SbPbba/CodeTuneGuide.html

Basically all you have to do is like this:

Example:

   call 'CodeTune.CreateTimeProfile' 300, 3000, CT_ALGO6, MyPointer, Algoritm1, StatusCode


or


   call 'CodeTune.CreateTimeProfile' 300, 3000, CT_ALGO2, 0, Algoritm1, StatusCode



In masm syntax it should be something like (Not sure about the "ADDR" token. I don´t remember the correct masm syntax, since those tokens are unnecessary in RosAsm) :


   invoke CreateTimeProfile 300, 3000, CT_ALGO6, ADDR MyPointer, ADDR Algoritm1, offset StatusCode


or


   invoke CreateTimeProfile 300, 3000, CT_ALGO2, 0, ADDR Algoritm1, offset StatusCode


The equates values  (CT_ALGO2, CT_ALGO6....) are displayed on the guide.

MyPointer = A pointer to a callback function from where you can show some information while the function is running. Useful if you plan to use CreateTimeProfile inside a Thread.
Algorithm1 = A pointer to a function from where you store your code to be tested. Similar functionality as in some parameters of the Apis CreateThread or qsort except that it have no parameters. It is used as a place holder for the function you are testing.
StatusCode a pointer to a variable that will store the actual status of the api (if it is running, if it finished, error messages and so on)

Read the guide and if you still are having problems, let me know and i´ll try to make a small example in masm.

Btw,  CreateTimeProfile calibrates itself internally so you don´t have to do anything like that. All you have to do is call this function choosing the desired Method (CT_ALGOXXX equates) and filling the parameters (1st = sample, 2nd = iterations....)

If i remember correctly the masm syntax, Algorithm1 (The placeholder to the function to be benchmarked) can be written as:

Algorithm1 proc, ; This function is a placeholder. Void (no parameters)

    szText db "Hello World !",0
    invoke strlen, ADDR szText ; <--- function to be tested

Algorithm1 endp
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Grincheux

Thaks, I only found source code. That does not interest me. Profiling yes. I will try.

guga

Only the source ? that´s Weird  :dazzled:. I uploaded the binary (dll) and lb/inc files on the 1st post as an attachment. Can´t you download it from there ? If not, i´ll post somewhere else for you
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Grincheux

The source does not interest me, only the tool (dll).
I don't need the source I will no be able to understand :(

guga

Hmm...you are not being able to dl the dll ? What browser are you using ? The file (dll) is attached on the 1st post.

I re-uploaded it here for you in case you are not succeeding to download from the board

http://www.4shared.com/zip/pfkwttEBce/CodeTuneLibrary.html
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Grincheux

I have the dll; that is ok.
I downloaded the html file from the link and I got VIRUSES.
Could you send another link ? or post by mail?

guga

Virus ??? from 4shared ? Are you sure it is not a false alarm ? I just downloaded it and no viruses here.

I´ll send to you by email.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Ok guys, i finished the new function CreateTimeProfileEx. This function uses all the Algorithms available one the user´s CPU and provide 2 types of data set to be interpreted.

One containing the best Mean value. So, the smallest mean value found using all available algorithms.

And other containing the Fastest value found on all of the algorithm methods. So, it locates among the Best mean (described above) what are the smallest (fastest) values found.

In general, the values of the 1st data set and the 2nd one do matches, meaning that the Fastest Value was found with the same Algorithm that was found the Fastest mean....But, sometimes it can varies.

Sometimes the fasted value is not necessarily found on the same algorithm method as the one that found the Best mean.  What matters to consider the most accurate value is the smallest (fastest) one always, despite what value was found on the "Best" Mean.

This is because the algorithms are collecting the proper timings of the tested function. So, after all internal interpretation is done, all that left is the values that most represents the actual time. So, the smallest one is the one that is the nearest to the time your function is using.

The problem is, i´m not being able to write that in English. I mean, what is the proper terms to name those data sets ??? Or a proper name for the structure itself ? (STDEx2 can be renamed as.... ?)

The representation is like this:

(Note: STDEx2 is what the function returns if suceeded.)


[STDEx2:
STDEx2.BestTimming: R$ 0 ; smallest value
STDEx2.IDFound: D$ 0 ; the smalest value was found with this Method

; Best Mean values
STDEx2.Data1.AlgoMethod: D$ 0 ; the method used to collect the best mean
STDEx2.Data1.Mean: R$ 0.0 ; <---- Below is a simple Standard deviation structure "CT_STANDARD_DEVIATION"
STDEx2.Data1.PopulationStd.Max: R$ 0.0
STDEx2.Data1.PopulationStd.Min: R$ 0.0
STDEx2.Data1.PopulationStd.Variance: R$ 0.0
STDEx2.Data1.PopulationStd.StandardDeviation: R$ 0.0
STDEx2.Data1.SampleStd.Max: R$ 0.0
STDEx2.Data1.SampleStd.Min: R$ 0.0
STDEx2.Data1.Sample.Variance: R$ 0.0
STDEx2.Data1.Sample.StandardDeviation: R$ 0.0

; Fastest value was found on this dataset
STDEx2.Data2.AlgoMethod: D$ 0 ; the fastest value was found with this method
STDEx2.Data2.Mean: R$ 0.0 ; ----> same as before. Simple Standard deviation containing of the Fastest Value found
STDEx2.Data2.PopulationStd.Max: R$ 0.0
STDEx2.Data2.PopulationStd.Min: R$ 0.0
STDEx2.Data2.PopulationStd.Variance: R$ 0.0
STDEx2.Data2.PopulationStd.StandardDeviation: R$ 0.0
STDEx2.Data2.SampleStd.Max: R$ 0.0
STDEx2.Data2.SampleStd.Min: R$ 0.0  ;<------------ In general, the fastest value is always found on this member !
STDEx2.Data2.Sample.Variance: R$ 0.0
STDEx2.Data2.Sample.StandardDeviation: R$ 0.0]


How to properly name Data1 and data2 ??

Btw: I´m deeply amazed with the speed found on Jonchen´s MasmBasicStrLen. The final time i found  for his algo is only 3.74033346275425 ns (Something around 10,99..... clock cycles). The margin of error of the new function seems to be only 0,7 to 0,8 %. Fortunately, i succeeded to decrease this rate to less then 1% of error :) . So, in JJ´s case, it is a error of less then 0,03 clock cycles. :) :)

I only need to decide what is the proper term to use for that in english before releasing it. What do you think is the most appropriated term for data1, data2 and STDEx2 ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Grincheux

Just a suggestion :
Could it be possible to display your dialog box, passing it the offset of the function to test. Like this we could test more algorithms.

guga

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Grincheux

I am afraid when I write an answer to your posts. I say "I will think I never am happy" Grumpy!
Having such a dialog would be a great interface in our pgm during  the test phase and better than running a profiler.
I suggest a parameter which indicate to profile the function or to ignore. Like this in real time under certain circumstances we could test again even if the pgm was finished many months ago. I imagine a parameter in a ini file that told the pgm that it must profile.
A little bit like debuggers do.