New timing macros

jj2007 · May 16, 2022, 09:42:21 AM

I am trying to improve on the standard timer macros, and would appreciate some test runs. So far it looks ok but far from perfect...

Code Select

	TimerStart
		void Instr_("This is just a stupid little test", "test", 0)
	TimerEnd a small test
	TimerStart
		Let esi=FileRead$("\Masm32\include\Windows.inc")
		void Instr_(esi, "Duplicate", 0)
	TimerEnd a big test
	TimerStart
		fld FP8(10000.0)
		fdiv FP4(10.0)
		fdiv FP4(10.0)
		fdiv FP4(10.0)
		fdiv FP4(10.0)
		fstp st
	TimerEnd 4*fdiv

Code Select

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
0        cycles for an empty loop

96       cycles for a small test
2965    kCycles for a big test
87       cycles for 4*fdiv

102      cycles for a small test
2950    kCycles for a big test
87       cycles for 4*fdiv

96       cycles for a small test
2960    kCycles for a big test
88       cycles for 4*fdiv

95       cycles for a small test
2963    kCycles for a big test
88       cycles for 4*fdiv

96       cycles for a small test
2962    kCycles for a big test
87       cycles for 4*fdiv

HSE · May 16, 2022, 10:04:32 AM

Code Select

Intel(R) Core(TM) i3-10100 CPU @ 3.60GHz
-3       cycles for an empty loop

61       cycles for a small test
2286    kCycles for a big test
51       cycles for 4*fdiv

59       cycles for a small test
2235    kCycles for a big test
52       cycles for 4*fdiv

61       cycles for a small test
2252    kCycles for a big test
51       cycles for 4*fdiv

60       cycles for a small test
2234    kCycles for a big test
52       cycles for 4*fdiv

60       cycles for a small test
2224    kCycles for a big test
54       cycles for 4*fdiv

hit any key

hutch-- · May 16, 2022, 10:13:50 AM

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
0 cycles for an empty loop

63 cycles for a small test
2372 kCycles for a big test
75 cycles for 4*fdiv

60 cycles for a small test
2521 kCycles for a big test
75 cycles for 4*fdiv

65 cycles for a small test
2529 kCycles for a big test
75 cycles for 4*fdiv

65 cycles for a small test
2447 kCycles for a big test
75 cycles for 4*fdiv

64 cycles for a small test
2360 kCycles for a big test
75 cycles for 4*fdiv

hit any key

jj2007 · May 16, 2022, 10:32:37 AM

Thanks, Hector & Hutch

I try to stabilise timings by cutting off the lowest (2%) and highest (40%) results, and taking the average of the middle values. Unfortunately, even this sophisticated approach does not yield good results. Sometimes the entire array of results is higher or lower, and I have no good explanation why

Here is the full array for the "big test"

Let esi=FileRead$("\Masm32\include\Windows.inc")
void Instr_(esi, "Duplicate", 0)

Code Select

0       2930312
1       2933809
2       2936058
3       2936948
4       2938885
5       2941745
6       2943788
7       2945243
8       2945521
9       2949879
10      2952848
11      2953736
12      2956294
13      2957591
14      2958842
15      2961609
16      2964733
17      2965709
18      2970148
19      2970306
20      2970579
21      2970582
22      2970948
23      2974688
24      2978464
25      2979094
26      2980370
27      2980833
28      2981564
29      2981591
30      2983309
31      2983336
32      2983730
33      2983836
34      2984424
35      2986352
36      2986682
37      2988949
38      2990821
39      2990963
40      2990973
41      2991821
42      2992367
43      2993306
44      2993379
45      2993466
46      2995312
47      2995399
48      2996222
49      2997051
50      2997827
51      2999610
52      2999851
53      3000382
54      3002397
55      3002476
56      3003954
57      3004109
58      3004753
59      3004775
60      3005360
61      3007000
62      3007688
63      3009229
64      3010270
65      3012121
66      3014591
67      3015858
68      3016028
69      3016749
70      3017688
71      3017825
72      3018927
73      3019824
74      3020443
75      3021909
76      3022003
77      3022600
78      3023700
79      3023961
80      3028300
81      3028418
82      3029500
83      3029745
84      3029827
85      3031285
86      3031724
87      3033169
88      3033377
89      3035666
90      3035970
91      3037459
92      3038854
93      3039341
94      3039362
95      3041824
96      3043073
97      3043763
98      3044064
99      3047979
100     3048806
101     3050252
102     3052958
103     3053943
104     3056554
105     3058038
106     3060349
107     3060516
108     3060972
109     3065804
110     3066998
111     3070333
112     3070509
113     3071900
114     3071960
115     3072677
116     3073488
117     3073609
118     3075852
119     3076869
120     3077982
121     3080533
122     3081342
123     3081397
124     3082915
125     3083890
126     3087896
127     3088278
128     3089543
129     3089922
130     3091488
131     3092003
132     3093339
133     3093891
134     3095475
135     3097324
136     3097631
137     3098403
138     3099164
139     3102406
140     3108633
141     3109394
142     3109521
143     3110369
144     3115745
145     3118991
146     3122158
147     3124303
148     3124874
149     3127251
150     3128369
151     3129200
152     3143797
153     3153240
154     3153543
155     3159927
156     3161839
157     3173825
158     3179803
159     3188800
160     3190745
161     3195179
162     3210408
163     3221058
164     3225581
165     3227312
166     3232076
167     3246871
168     3254988
169     3257103
170     3265397
171     3270494
172     3270606
173     3285873
174     3290494
175     3293318
176     3295071
177     3298356
178     3311715
179     3313875
180     3331991
181     3335859
182     3355524
183     3360715
184     3367790
185     3372294
186     3400711
187     3425828
188     3426903
189     3502082
190     3596308
191     3917123
192     4030047
193     4219088
194     8213100
195     8340445
196     8358491
197     8548839
198     8652628
199     8803120
200     8929203
2949    kCycles for a big test

For comparison, the 4*fdiv array:

Code Select

0       172
1       172
2       172
3       172
4       172
5       172
6       172
7       172
8       172
9       172
10      172
11      172
12      172
13      172
14      172
15      172
16      172
17      172
18      172
19      172
20      172
21      172
22      172
23      172
24      172
25      172
26      172
27      172
28      172
29      172
30      172
31      172
32      172
33      172
34      172
35      172
36      172
37      172
38      172
39      172
40      172
41      172
42      172
43      172
44      172
45      172
46      172
47      172
48      172
49      172
50      172
51      172
52      172
53      172
54      172
55      172
56      172
57      172
58      172
59      172
60      172
61      172
62      172
63      172
64      172
65      172
66      172
67      172
68      172
69      172
70      172
71      172
72      172
73      172
74      172
75      172
76      172
77      172
78      172
79      172
80      172
81      172
82      172
83      172
84      172
85      172
86      172
87      172
88      172
89      175
90      175
91      175
92      175
93      175
94      175
95      175
96      175
97      175
98      175
99      175
100     175
101     175
102     175
103     175
104     175
105     175
106     175
107     175
108     175
109     175
110     175
111     175
112     175
113     175
114     175
115     175
116     175
117     175
118     175
119     175
120     175
121     175
122     175
123     175
124     175
125     175
126     175
127     175
128     175
129     175
130     175
131     175
132     175
133     175
134     175
135     175
136     175
137     175
138     175
139     175
140     175
141     175
142     175
143     175
144     175
145     175
146     178
147     185
148     185
149     185
150     185
151     185
152     185
153     185
154     185
155     185
156     185
157     185
158     185
159     185
160     185
161     185
162     185
163     185
164     185
165     185
166     185
167     185
168     185
169     185
170     185
171     185
172     185
173     185
174     185
175     185
176     185
177     185
178     185
179     185
180     185
181     185
182     185
183     185
184     185
185     185
186     185
187     185
188     185
189     185
190     185
191     185
192     185
193     185
194     185
195     185
196     185
197     185
198     185
199     185
200     185
201     185
202     185
203     185
204     185
205     185
206     185
207     185
208     185
209     185
210     185
211     185
212     185
213     185
214     185
215     185
216     185
217     185
218     185
219     185
220     185
221     185
222     185
223     185
224     185
225     185
226     185
227     185
228     185
229     185
230     185
231     185
232     185
233     185
234     185
235     185
236     185
237     185
238     185
239     185
240     185
241     185
242     185
243     185
244     185
245     185
246     185
247     185
248     185
249     185
250     185
251     185
252     185
253     185
254     185
255     185
256     185
257     185
258     185
259     185
260     185
261     185
262     185
263     185
264     185
265     185
266     185
267     185
268     185
269     185
270     185
271     185
272     185
273     185
274     185
275     185
276     185
277     185
278     185
279     185
280     185
281     185
282     185
283     185
284     185
285     185
286     185
287     185
288     185
289     185
290     185
291     185
292     185
293     185
294     185
295     185
296     185
297     185
298     185
299     185
300     185
301     185
302     185
303     185
304     185
305     185
306     185
307     185
308     185
309     185
310     185
311     185
312     185
313     185
314     185
315     185
316     185
317     185
318     185
319     185
320     185
321     185
322     185
323     185
324     185
325     185
326     185
327     185
328     185
329     185
330     185
331     185
332     185
333     185
334     185
335     185
336     185
337     185
338     185
339     185
340     185
341     185
342     185
343     185
344     185
345     185
346     185
347     185
348     185
349     185
350     185
351     185
352     185
353     185
354     185
355     185
356     185
357     185
358     185
359     185
360     185
361     185
362     185
363     185
364     185
365     185
366     185
367     185
368     185
369     185
370     185
371     185
372     185
373     185
374     185
375     185
376     185
377     185
378     185
379     185
380     185
381     185
382     185
383     185
384     185
385     185
386     185
387     185
388     185
389     185
390     185
391     185
392     185
393     185
394     185
395     185
396     185
397     185
398     185
399     185
400     185
401     185
402     185
403     185
404     185
405     185
406     185
407     185
408     185
409     185
410     185
411     185
412     185
413     185
414     185
415     185
416     185
417     185
418     185
419     185
420     185
421     185
422     185
423     185
424     185
425     185
426     185
427     185
428     185
429     185
430     185
431     185
432     185
433     185
434     185
435     185
436     185
437     185
438     185
439     185
440     185
441     185
442     185
443     185
444     185
445     185
446     185
447     185
448     185
449     185
450     185
451     185
452     185
453     185
454     185
455     185
456     185
457     185
458     185
459     185
460     185
461     185
462     185
463     185
464     185
465     185
466     185
467     185
468     185
469     185
470     185
471     185
472     185
473     185
474     185
475     185
476     185
477     185
478     185
479     185
480     185
481     185
482     185
483     185
484     185
485     185
486     185
487     185
488     185
489     185
490     185
491     185
492     185
493     185
494     185
495     185
496     185
497     185
498     185
499     185
500     185
501     185
502     185
503     185
504     185
505     185
506     185
507     185
508     185
509     185
510     185
511     185
512     185
513     185
514     185
515     185
516     185
517     185
518     185
519     185
520     185
521     185
522     185
523     185
524     185
525     185
526     185
527     185
528     185
529     185
530     185
531     185
532     185
533     185
534     185
535     185
536     185
537     185
538     185
539     185
540     185
541     185
542     185
543     185
544     185
545     185
546     185
547     185
548     185
549     185
550     185
551     185
552     185
553     185
554     185
555     185
556     185
557     185
558     185
559     185
560     185
561     185
562     185
563     185
564     185
565     185
566     185
567     185
568     185
569     188
570     188
571     188
572     188
573     188
574     188
575     188
576     188
577     188
578     188
579     188
580     188
581     188
582     188
583     188
584     188
585     188
586     188
587     188
588     188
589     188
590     188
591     188
592     188
593     188
594     188
595     188
596     188
597     188
598     188
599     188
600     188
601     188
602     188
603     188
604     188
605     188
606     188
607     188
608     188
609     188
610     188
611     188
612     188
613     188
614     188
615     188
616     188
617     188
618     188
619     188
620     188
621     188
622     188
623     188
624     188
625     188
626     188
627     188
628     188
629     188
630     188
631     188
632     188
633     188
634     188
635     188
636     188
637     188
638     188
639     188
640     188
641     188
642     188
643     188
644     188
645     188
646     188
647     188
648     188
649     188
650     188
651     188
652     188
653     188
654     188
655     188
656     188
657     188
658     188
659     188
660     188
661     188
662     188
663     188
664     188
665     188
666     188
667     188
668     188
669     188
670     188
671     188
672     188
673     188
674     188
675     188
676     188
677     188
678     188
679     188
680     188
681     188
682     188
683     188
684     188
685     188
686     188
687     188
688     188
689     188
690     188
691     188
692     188
693     188
694     188
695     188
696     188
697     188
698     188
699     188
700     188
701     188
702     188
703     188
704     188
705     188
706     188
707     188
708     188
709     188
710     188
711     188
712     188
713     188
714     188
715     188
716     188
717     188
718     188
719     188
720     188
721     188
722     188
723     188
724     188
725     188
726     188
727     188
728     188
729     188
730     188
731     188
732     188
733     188
734     188
735     188
736     188
737     188
738     188
739     188
740     188
741     188
742     188
743     188
744     188
745     188
746     188
747     188
748     188
749     188
750     188
751     188
752     188
753     188
754     188
755     188
756     188
757     188
758     188
759     188
760     188
761     188
762     188
763     188
764     188
765     188
766     188
767     188
768     188
769     188
770     188
771     188
772     188
773     188
774     188
775     188
776     188
777     188
778     188
779     188
780     188
781     188
782     188
783     188
784     188
785     188
786     188
787     188
788     188
789     188
790     188
791     188
792     188
793     188
794     188
795     188
796     188
797     188
798     188
799     188
800     188
801     188
802     188
803     188
804     188
805     188
806     188
807     188
808     188
809     188
810     188
811     188
812     188
813     188
814     188
815     188
816     188
817     188
818     188
819     188
820     188
821     188
822     188
823     188
824     188
825     188
826     188
827     188
828     188
829     188
830     188
831     188
832     188
833     188
834     188
835     188
836     188
837     188
838     188
839     188
840     188
841     188
842     188
843     188
844     188
845     188
846     188
847     188
848     188
849     188
850     188
851     188
852     188
853     188
854     188
855     188
856     188
857     188
858     188
859     188
860     188
861     188
862     188
863     188
864     188
865     188
866     188
867     188
868     188
869     188
870     188
871     188
872     188
873     188
874     188
875     188
876     188
877     188
878     188
879     188
880     188
881     188
882     188
883     188
884     188
885     188
886     188
887     188
888     188
889     188
890     188
891     188
892     188
893     188
894     188
895     188
896     188
897     188
898     188
899     188
900     188
901     188
902     188
903     188
904     188
905     188
906     188
907     188
908     188
909     188
910     188
911     188
912     188
913     188
914     188
915     188
916     188
917     188
918     188
919     188
920     188
921     188
922     188
923     188
924     188
925     188
926     188
927     188
928     188
929     188
930     188
931     188
932     188
933     188
934     188
935     188
936     188
937     188
938     188
939     188
940     188
941     188
942     188
943     188
944     188
945     188
946     188
947     188
948     188
949     188
950     191
951     191
952     191
953     191
954     191
955     191
956     191
957     191
958     278
959     278
960     281
961     288
962     288
963     291
964     291
965     291
966     291
967     291
968     291
969     291
970     291
971     291
972     291
973     291
974     291
975     291
976     291
977     291
978     291
979     291
980     291
981     294
982     294
983     294
984     294
985     294
986     294
987     294
988     294
989     294
990     294
991     294
992     294
993     294
994     294
995     294
996     297
997     297
998     397
999     400
1000    821

Note the little jumps:

Code Select

88      172
89      175
...
145     175
146     178
147     185

How do you explain them? It's not the variable processor speed, as it's measured with rdtsc, i.e. cycles. Can't be interrupts either, they would cause much bigger hiccups. So what makes the processor insert 3 cycles here and there?

quarantined · May 16, 2022, 01:14:50 PM

Quote
I try to stabilise timings by cutting off the lowest (2%) and highest (40%) results, and taking the average of the middle values.

There are always going to be inconsistencies which seem to be inherent with the methods used in the timing macros. Not the macro authors fault though. I had once tried several different averaging methods using the timing macros and even a custom rewritten version of the macros. And I got a lot of grief from you jj for doing so.

You'll probably just have to live with the 'outliers'...

Code Select


Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
0        cycles for an empty loop

98       cycles for a small test
5438    kCycles for a big test
108      cycles for 4*fdiv

98       cycles for a small test
5108    kCycles for a big test
108      cycles for 4*fdiv

98       cycles for a small test
4872    kCycles for a big test
108      cycles for 4*fdiv

98       cycles for a small test
4828    kCycles for a big test
108      cycles for 4*fdiv

98       cycles for a small test
4785    kCycles for a big test
108      cycles for 4*fdiv

hit any key

[code]

Yeah, the cycle count should'nt change I would think.

hutch-- · May 16, 2022, 02:14:16 PM

From Michael Webster's to other variations, they all suffer the same problem, CPU hardware has a whole pile of tricks that make timing at very close range, nearly impossible. Calculating the loop and other overhead varies from one CPU and computer setup to another and this is where the main problem lies.

Having tried many over time, I still revert to clocking algos in real time which is just about useless at very short durations. The alternative is to use far longer durations which means very large samples OR high loop counts.

Roughly, if you are looking for the sustain rate of an algo (absolute throughput) you use much longer samples where alternatively, if you want to clock the attack rate, you produce algos with very low overhead and run them through a high count of iterations. Dial a test for what you want an algo to perform and you will get something like useful "comparison" information.

jj2007 · May 16, 2022, 05:59:18 PM

Quote from: quarantined on May 16, 2022, 01:14:50 PMYou'll probably just have to live with the 'outliers'...
...
Yeah, the cycle count should'nt change I would think.

The point is that I don't live with the outliers, because I do eliminate the 40% slowest. And yet, there are runs that are systematically some % slower. Same code, same position, so there is absolutely no reason why some fild [esp] ... fistp myvar (for example) sequence should take 80x 77 cycles in the first run, 80x 83 in the second and 80x 77 again in the third run. In all three runs, there are a few outliers with 99 or even 200 cycles, but they are thrown out by sorting and eliminating the slower ones.

It's not logical.

I attach a new version that allows (by pressing y at the end) to export the sorted arrays of timings. Have a look at the "tails", but also at the start of each array.

In theory, the fastest should be the "true" result, because a cpu can't just drop some cycles if it feels in the right mood, right?

Sample top results (the 0 0 is a bug that I still have to fix):

Code Select

Sorted timings for finding 'Duplicate' with crt strstr
0	0
1	2494306
2	2518611
3	2526306
Sorted timings for finding 'Duplicate' with crt strstr
0	0
1	2471897
2	2532000
3	2538188
Sorted timings for finding 'Duplicate' with crt strstr
0	0
1	2514656
2	2552965
3	2554821
Sorted timings for finding 'Duplicate' with crt strstr
0	0
1	2515663
2	2536230
3	2539026
Sorted timings for finding 'Duplicate' with crt strstr
0	0
1	2520319
2	2551161
3	2557537

Corresponding 'tails' (note #4, 10189333!):

Code Select

198	4024483
199	4029707
200	4168568
2567	kCycles for finding 'Duplicate' with crt strstr
198	4039726
199	4358796
200	4849407
2568	kCycles for finding 'Duplicate' with crt strstr
198	3059861
199	3060097
200	3106652
2563	kCycles for finding 'Duplicate' with crt strstr
198	6868355
199	10076800
200	10189333
2588	kCycles for finding 'Duplicate' with crt strstr
198	4001836
199	4147888
200	4317080
2567	kCycles for finding 'Duplicate' with crt strstr

quarantined · May 17, 2022, 01:13:54 AM

What probably would be a good idea to do if someone had the time is to create a lookup table with the published cycle count (Intel, AMD, other?) for the myriad of instructions for the target cpu. Then iterate through all the instructions in the algorithm and calculate the total cycle count via the cycle count lookup table to find exactly what the cycle count 'should' be. Of course for short sequences can be done manually.
Will depend on exact processor, architecture, etc., of course. Should avoid branching in test algo of course.
Just a thought.

RDTSC problem? Processor problem? Glitch in the proverbial MATRIX? Who knows?

I don't have the Force, so cannot provide the Jedi Mind Tricks that Agner Fog had up his sleeves. I had given up on cycle count comparisons in favor of running a given algo multiple times and getting the average running time for speed comparisons. Usually produces satisfactory results for "is algo A faster than algo B..."

jj2007 · May 17, 2022, 01:57:31 AM

Quote from: quarantined on May 17, 2022, 01:13:54 AMa lookup table with the published cycle count

The times when this used to work are gone, see instruction pipelining. Many instructions can nowadays run in parallel, others have to wait until the previous one has completed. It's very messy.

jj2007 · May 17, 2022, 11:56:50 AM

Just for fun... size the window to see the distribution of cycle counts, it's hilarious (and I was kind: I cut off the slowest 20)

CyCtStart
fldpi
fdiv FP4(0.1)
fstp st
CyCtEnd <fldpi+fdiv+fstp st>

CyCtStart
void Instr_("Hello World, this is a little test of MasmBasic's Instr macro", "macro")
CyCtEnd <Instr>

quarantined · May 17, 2022, 02:56:02 PM

Quote from: jj2007 on May 17, 2022, 01:57:31 AM
The times when this used to work are gone, see instruction pipelining. Many instructions can nowadays run in parallel, others have to wait until the previous one has completed. It's very messy.

Well that really throws a spanner into the gears. I'll check out your latest when I'm back at my computer...

HSE · May 17, 2022, 09:44:36 PM

Hi JJ!

That look really wrong

Could be interesting comparations with MichaelW's timers and GetTickCount.

jj2007 · May 17, 2022, 11:03:07 PM

Hi Hector,

In a console proggie, as attached, the new macros perform much better than MichaelW's \Masm32\macros\timers.asm, especially for somewhat longer tests:

Code Select

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
23       Cycles for some short instructions

18       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
188     kCycles for finding 'Duplicate' with Instr_
2503    kCycles for finding 'Duplicate' with InString
2571    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore

18       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
191     kCycles for finding 'Duplicate' with Instr_
2494    kCycles for finding 'Duplicate' with InString
2574    kCycles for finding 'Duplicate' with crt strstr
713     kCycles for finding 'Duplicate' with Boyer-Moore

19       Cycles for fldpi+1*fdiv+fstp st
19       Cycles for fldpi+1*fdiv+fstp st
188     kCycles for finding 'Duplicate' with Instr_
2493    kCycles for finding 'Duplicate' with InString
2565    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore

21       Cycles for fldpi+1*fdiv+fstp st
24       Cycles for fldpi+1*fdiv+fstp st
188     kCycles for finding 'Duplicate' with Instr_
2490    kCycles for finding 'Duplicate' with InString
2573    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore

24       Cycles for fldpi+1*fdiv+fstp st
19       Cycles for fldpi+1*fdiv+fstp st
188     kCycles for finding 'Duplicate' with Instr_
2486    kCycles for finding 'Duplicate' with InString
2573    kCycles for finding 'Duplicate' with crt strstr
711     kCycles for finding 'Duplicate' with Boyer-Moore

The interesting question is why that is so dramatically different in a GUI application

HSE · May 17, 2022, 11:56:00 PM

Quote from: jj2007 on May 17, 2022, 11:03:07 PM
the new macros perform much better than MichaelW's \Masm32\macros\timers.asm, especially for somewhat longer tests:

Doesn't matter that

Just interesting to see how are fluctuations of other timings in same graphic.

jj2007 · May 18, 2022, 01:35:31 AM

See yourself... Michael's macros have a loop inside the rdtsc ... test code ... rdtsc sequence.

The MASM Forum

News:

New timing macros

jj2007

HSE

hutch--

jj2007

quarantined

hutch--

jj2007

quarantined

jj2007

jj2007

quarantined

HSE

jj2007

HSE

jj2007