The MASM Forum

General => The Laboratory => Topic started by: jj2007 on May 16, 2022, 09:42:21 AM

Title: New timing macros
Post by: jj2007 on May 16, 2022, 09:42:21 AM
I am trying to improve on the standard timer macros, and would appreciate some test runs. So far it looks ok but far from perfect...

TimerStart
void Instr_("This is just a stupid little test", "test", 0)
TimerEnd a small test
TimerStart
Let esi=FileRead$("\Masm32\include\Windows.inc")
void Instr_(esi, "Duplicate", 0)
TimerEnd a big test
TimerStart
fld FP8(10000.0)
fdiv FP4(10.0)
fdiv FP4(10.0)
fdiv FP4(10.0)
fdiv FP4(10.0)
fstp st
TimerEnd 4*fdiv


Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
0        cycles for an empty loop

96       cycles for a small test
2965    kCycles for a big test
87       cycles for 4*fdiv

102      cycles for a small test
2950    kCycles for a big test
87       cycles for 4*fdiv

96       cycles for a small test
2960    kCycles for a big test
88       cycles for 4*fdiv

95       cycles for a small test
2963    kCycles for a big test
88       cycles for 4*fdiv

96       cycles for a small test
2962    kCycles for a big test
87       cycles for 4*fdiv
Title: Re: New timing macros
Post by: HSE on May 16, 2022, 10:04:32 AM
Intel(R) Core(TM) i3-10100 CPU @ 3.60GHz
-3       cycles for an empty loop

61       cycles for a small test
2286    kCycles for a big test
51       cycles for 4*fdiv

59       cycles for a small test
2235    kCycles for a big test
52       cycles for 4*fdiv

61       cycles for a small test
2252    kCycles for a big test
51       cycles for 4*fdiv

60       cycles for a small test
2234    kCycles for a big test
52       cycles for 4*fdiv

60       cycles for a small test
2224    kCycles for a big test
54       cycles for 4*fdiv

hit any key
Title: Re: New timing macros
Post by: hutch-- on May 16, 2022, 10:13:50 AM

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
0        cycles for an empty loop

63       cycles for a small test
2372    kCycles for a big test
75       cycles for 4*fdiv

60       cycles for a small test
2521    kCycles for a big test
75       cycles for 4*fdiv

65       cycles for a small test
2529    kCycles for a big test
75       cycles for 4*fdiv

65       cycles for a small test
2447    kCycles for a big test
75       cycles for 4*fdiv

64       cycles for a small test
2360    kCycles for a big test
75       cycles for 4*fdiv

hit any key
Title: Re: New timing macros
Post by: jj2007 on May 16, 2022, 10:32:37 AM
Thanks, Hector & Hutch :thup:

I try to stabilise timings by cutting off the lowest (2%) and highest (40%) results, and taking the average of the middle values. Unfortunately, even this sophisticated approach does not yield good results. Sometimes the entire array of results is higher or lower, and I have no good explanation why :sad:

Here is the full array for the "big test"

      Let esi=FileRead$("\Masm32\include\Windows.inc")
      void Instr_(esi, "Duplicate", 0)

0       2930312
1       2933809
2       2936058
3       2936948
4       2938885
5       2941745
6       2943788
7       2945243
8       2945521
9       2949879
10      2952848
11      2953736
12      2956294
13      2957591
14      2958842
15      2961609
16      2964733
17      2965709
18      2970148
19      2970306
20      2970579
21      2970582
22      2970948
23      2974688
24      2978464
25      2979094
26      2980370
27      2980833
28      2981564
29      2981591
30      2983309
31      2983336
32      2983730
33      2983836
34      2984424
35      2986352
36      2986682
37      2988949
38      2990821
39      2990963
40      2990973
41      2991821
42      2992367
43      2993306
44      2993379
45      2993466
46      2995312
47      2995399
48      2996222
49      2997051
50      2997827
51      2999610
52      2999851
53      3000382
54      3002397
55      3002476
56      3003954
57      3004109
58      3004753
59      3004775
60      3005360
61      3007000
62      3007688
63      3009229
64      3010270
65      3012121
66      3014591
67      3015858
68      3016028
69      3016749
70      3017688
71      3017825
72      3018927
73      3019824
74      3020443
75      3021909
76      3022003
77      3022600
78      3023700
79      3023961
80      3028300
81      3028418
82      3029500
83      3029745
84      3029827
85      3031285
86      3031724
87      3033169
88      3033377
89      3035666
90      3035970
91      3037459
92      3038854
93      3039341
94      3039362
95      3041824
96      3043073
97      3043763
98      3044064
99      3047979
100     3048806
101     3050252
102     3052958
103     3053943
104     3056554
105     3058038
106     3060349
107     3060516
108     3060972
109     3065804
110     3066998
111     3070333
112     3070509
113     3071900
114     3071960
115     3072677
116     3073488
117     3073609
118     3075852
119     3076869
120     3077982
121     3080533
122     3081342
123     3081397
124     3082915
125     3083890
126     3087896
127     3088278
128     3089543
129     3089922
130     3091488
131     3092003
132     3093339
133     3093891
134     3095475
135     3097324
136     3097631
137     3098403
138     3099164
139     3102406
140     3108633
141     3109394
142     3109521
143     3110369
144     3115745
145     3118991
146     3122158
147     3124303
148     3124874
149     3127251
150     3128369
151     3129200
152     3143797
153     3153240
154     3153543
155     3159927
156     3161839
157     3173825
158     3179803
159     3188800
160     3190745
161     3195179
162     3210408
163     3221058
164     3225581
165     3227312
166     3232076
167     3246871
168     3254988
169     3257103
170     3265397
171     3270494
172     3270606
173     3285873
174     3290494
175     3293318
176     3295071
177     3298356
178     3311715
179     3313875
180     3331991
181     3335859
182     3355524
183     3360715
184     3367790
185     3372294
186     3400711
187     3425828
188     3426903
189     3502082
190     3596308
191     3917123
192     4030047
193     4219088
194     8213100
195     8340445
196     8358491
197     8548839
198     8652628
199     8803120
200     8929203
2949    kCycles for a big test


For comparison, the 4*fdiv array:
0       172
1       172
2       172
3       172
4       172
5       172
6       172
7       172
8       172
9       172
10      172
11      172
12      172
13      172
14      172
15      172
16      172
17      172
18      172
19      172
20      172
21      172
22      172
23      172
24      172
25      172
26      172
27      172
28      172
29      172
30      172
31      172
32      172
33      172
34      172
35      172
36      172
37      172
38      172
39      172
40      172
41      172
42      172
43      172
44      172
45      172
46      172
47      172
48      172
49      172
50      172
51      172
52      172
53      172
54      172
55      172
56      172
57      172
58      172
59      172
60      172
61      172
62      172
63      172
64      172
65      172
66      172
67      172
68      172
69      172
70      172
71      172
72      172
73      172
74      172
75      172
76      172
77      172
78      172
79      172
80      172
81      172
82      172
83      172
84      172
85      172
86      172
87      172
88      172
89      175
90      175
91      175
92      175
93      175
94      175
95      175
96      175
97      175
98      175
99      175
100     175
101     175
102     175
103     175
104     175
105     175
106     175
107     175
108     175
109     175
110     175
111     175
112     175
113     175
114     175
115     175
116     175
117     175
118     175
119     175
120     175
121     175
122     175
123     175
124     175
125     175
126     175
127     175
128     175
129     175
130     175
131     175
132     175
133     175
134     175
135     175
136     175
137     175
138     175
139     175
140     175
141     175
142     175
143     175
144     175
145     175
146     178
147     185
148     185
149     185
150     185
151     185
152     185
153     185
154     185
155     185
156     185
157     185
158     185
159     185
160     185
161     185
162     185
163     185
164     185
165     185
166     185
167     185
168     185
169     185
170     185
171     185
172     185
173     185
174     185
175     185
176     185
177     185
178     185
179     185
180     185
181     185
182     185
183     185
184     185
185     185
186     185
187     185
188     185
189     185
190     185
191     185
192     185
193     185
194     185
195     185
196     185
197     185
198     185
199     185
200     185
201     185
202     185
203     185
204     185
205     185
206     185
207     185
208     185
209     185
210     185
211     185
212     185
213     185
214     185
215     185
216     185
217     185
218     185
219     185
220     185
221     185
222     185
223     185
224     185
225     185
226     185
227     185
228     185
229     185
230     185
231     185
232     185
233     185
234     185
235     185
236     185
237     185
238     185
239     185
240     185
241     185
242     185
243     185
244     185
245     185
246     185
247     185
248     185
249     185
250     185
251     185
252     185
253     185
254     185
255     185
256     185
257     185
258     185
259     185
260     185
261     185
262     185
263     185
264     185
265     185
266     185
267     185
268     185
269     185
270     185
271     185
272     185
273     185
274     185
275     185
276     185
277     185
278     185
279     185
280     185
281     185
282     185
283     185
284     185
285     185
286     185
287     185
288     185
289     185
290     185
291     185
292     185
293     185
294     185
295     185
296     185
297     185
298     185
299     185
300     185
301     185
302     185
303     185
304     185
305     185
306     185
307     185
308     185
309     185
310     185
311     185
312     185
313     185
314     185
315     185
316     185
317     185
318     185
319     185
320     185
321     185
322     185
323     185
324     185
325     185
326     185
327     185
328     185
329     185
330     185
331     185
332     185
333     185
334     185
335     185
336     185
337     185
338     185
339     185
340     185
341     185
342     185
343     185
344     185
345     185
346     185
347     185
348     185
349     185
350     185
351     185
352     185
353     185
354     185
355     185
356     185
357     185
358     185
359     185
360     185
361     185
362     185
363     185
364     185
365     185
366     185
367     185
368     185
369     185
370     185
371     185
372     185
373     185
374     185
375     185
376     185
377     185
378     185
379     185
380     185
381     185
382     185
383     185
384     185
385     185
386     185
387     185
388     185
389     185
390     185
391     185
392     185
393     185
394     185
395     185
396     185
397     185
398     185
399     185
400     185
401     185
402     185
403     185
404     185
405     185
406     185
407     185
408     185
409     185
410     185
411     185
412     185
413     185
414     185
415     185
416     185
417     185
418     185
419     185
420     185
421     185
422     185
423     185
424     185
425     185
426     185
427     185
428     185
429     185
430     185
431     185
432     185
433     185
434     185
435     185
436     185
437     185
438     185
439     185
440     185
441     185
442     185
443     185
444     185
445     185
446     185
447     185
448     185
449     185
450     185
451     185
452     185
453     185
454     185
455     185
456     185
457     185
458     185
459     185
460     185
461     185
462     185
463     185
464     185
465     185
466     185
467     185
468     185
469     185
470     185
471     185
472     185
473     185
474     185
475     185
476     185
477     185
478     185
479     185
480     185
481     185
482     185
483     185
484     185
485     185
486     185
487     185
488     185
489     185
490     185
491     185
492     185
493     185
494     185
495     185
496     185
497     185
498     185
499     185
500     185
501     185
502     185
503     185
504     185
505     185
506     185
507     185
508     185
509     185
510     185
511     185
512     185
513     185
514     185
515     185
516     185
517     185
518     185
519     185
520     185
521     185
522     185
523     185
524     185
525     185
526     185
527     185
528     185
529     185
530     185
531     185
532     185
533     185
534     185
535     185
536     185
537     185
538     185
539     185
540     185
541     185
542     185
543     185
544     185
545     185
546     185
547     185
548     185
549     185
550     185
551     185
552     185
553     185
554     185
555     185
556     185
557     185
558     185
559     185
560     185
561     185
562     185
563     185
564     185
565     185
566     185
567     185
568     185
569     188
570     188
571     188
572     188
573     188
574     188
575     188
576     188
577     188
578     188
579     188
580     188
581     188
582     188
583     188
584     188
585     188
586     188
587     188
588     188
589     188
590     188
591     188
592     188
593     188
594     188
595     188
596     188
597     188
598     188
599     188
600     188
601     188
602     188
603     188
604     188
605     188
606     188
607     188
608     188
609     188
610     188
611     188
612     188
613     188
614     188
615     188
616     188
617     188
618     188
619     188
620     188
621     188
622     188
623     188
624     188
625     188
626     188
627     188
628     188
629     188
630     188
631     188
632     188
633     188
634     188
635     188
636     188
637     188
638     188
639     188
640     188
641     188
642     188
643     188
644     188
645     188
646     188
647     188
648     188
649     188
650     188
651     188
652     188
653     188
654     188
655     188
656     188
657     188
658     188
659     188
660     188
661     188
662     188
663     188
664     188
665     188
666     188
667     188
668     188
669     188
670     188
671     188
672     188
673     188
674     188
675     188
676     188
677     188
678     188
679     188
680     188
681     188
682     188
683     188
684     188
685     188
686     188
687     188
688     188
689     188
690     188
691     188
692     188
693     188
694     188
695     188
696     188
697     188
698     188
699     188
700     188
701     188
702     188
703     188
704     188
705     188
706     188
707     188
708     188
709     188
710     188
711     188
712     188
713     188
714     188
715     188
716     188
717     188
718     188
719     188
720     188
721     188
722     188
723     188
724     188
725     188
726     188
727     188
728     188
729     188
730     188
731     188
732     188
733     188
734     188
735     188
736     188
737     188
738     188
739     188
740     188
741     188
742     188
743     188
744     188
745     188
746     188
747     188
748     188
749     188
750     188
751     188
752     188
753     188
754     188
755     188
756     188
757     188
758     188
759     188
760     188
761     188
762     188
763     188
764     188
765     188
766     188
767     188
768     188
769     188
770     188
771     188
772     188
773     188
774     188
775     188
776     188
777     188
778     188
779     188
780     188
781     188
782     188
783     188
784     188
785     188
786     188
787     188
788     188
789     188
790     188
791     188
792     188
793     188
794     188
795     188
796     188
797     188
798     188
799     188
800     188
801     188
802     188
803     188
804     188
805     188
806     188
807     188
808     188
809     188
810     188
811     188
812     188
813     188
814     188
815     188
816     188
817     188
818     188
819     188
820     188
821     188
822     188
823     188
824     188
825     188
826     188
827     188
828     188
829     188
830     188
831     188
832     188
833     188
834     188
835     188
836     188
837     188
838     188
839     188
840     188
841     188
842     188
843     188
844     188
845     188
846     188
847     188
848     188
849     188
850     188
851     188
852     188
853     188
854     188
855     188
856     188
857     188
858     188
859     188
860     188
861     188
862     188
863     188
864     188
865     188
866     188
867     188
868     188
869     188
870     188
871     188
872     188
873     188
874     188
875     188
876     188
877     188
878     188
879     188
880     188
881     188
882     188
883     188
884     188
885     188
886     188
887     188
888     188
889     188
890     188
891     188
892     188
893     188
894     188
895     188
896     188
897     188
898     188
899     188
900     188
901     188
902     188
903     188
904     188
905     188
906     188
907     188
908     188
909     188
910     188
911     188
912     188
913     188
914     188
915     188
916     188
917     188
918     188
919     188
920     188
921     188
922     188
923     188
924     188
925     188
926     188
927     188
928     188
929     188
930     188
931     188
932     188
933     188
934     188
935     188
936     188
937     188
938     188
939     188
940     188
941     188
942     188
943     188
944     188
945     188
946     188
947     188
948     188
949     188
950     191
951     191
952     191
953     191
954     191
955     191
956     191
957     191
958     278
959     278
960     281
961     288
962     288
963     291
964     291
965     291
966     291
967     291
968     291
969     291
970     291
971     291
972     291
973     291
974     291
975     291
976     291
977     291
978     291
979     291
980     291
981     294
982     294
983     294
984     294
985     294
986     294
987     294
988     294
989     294
990     294
991     294
992     294
993     294
994     294
995     294
996     297
997     297
998     397
999     400
1000    821


Note the little jumps:
88      172
89      175
...
145     175
146     178
147     185


How do you explain them? It's not the variable processor speed, as it's measured with rdtsc, i.e. cycles. Can't be interrupts either, they would cause much bigger hiccups. So what makes the processor insert 3 cycles here and there?  :rolleyes:
Title: Re: New timing macros
Post by: quarantined on May 16, 2022, 01:14:50 PM
Quote
I try to stabilise timings by cutting off the lowest (2%) and highest (40%) results, and taking the average of the middle values.

:badgrin:  :badgrin:  :rolleyes:

There are always going to be inconsistencies which seem to be inherent with the methods used in the timing macros. Not the macro authors fault though. I had once tried several different averaging methods using the timing macros and even a custom rewritten version of the macros. And I got a lot of grief from you jj for doing so.

You'll probably just have to live with the 'outliers'...   :tongue:






Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
0        cycles for an empty loop

98       cycles for a small test
5438    kCycles for a big test
108      cycles for 4*fdiv

98       cycles for a small test
5108    kCycles for a big test
108      cycles for 4*fdiv

98       cycles for a small test
4872    kCycles for a big test
108      cycles for 4*fdiv

98       cycles for a small test
4828    kCycles for a big test
108      cycles for 4*fdiv

98       cycles for a small test
4785    kCycles for a big test
108      cycles for 4*fdiv

hit any key

[code]

Yeah, the cycle count should'nt change I would think.
Title: Re: New timing macros
Post by: hutch-- on May 16, 2022, 02:14:16 PM
From Michael Webster's to other variations, they all suffer the same problem, CPU hardware has a whole pile of tricks that make timing at very close range, nearly impossible. Calculating the loop and other overhead varies from one CPU and computer setup to another and this is where the main problem lies.

Having tried many over time, I still revert to clocking algos in real time which is just about useless at very short durations. The alternative is to use far longer durations which means very large samples OR high loop counts.

Roughly, if you are looking for the sustain rate of an algo (absolute throughput) you use much longer samples where alternatively, if you want to clock the attack rate, you produce algos with very low overhead and run them through a high count of iterations. Dial a test for what you want an algo to perform and you will get something like useful "comparison" information.
Title: Re: New timing macros
Post by: jj2007 on May 16, 2022, 05:59:18 PM
Quote from: quarantined on May 16, 2022, 01:14:50 PMYou'll probably just have to live with the 'outliers'...   :tongue:
...
Yeah, the cycle count should'nt change I would think.

The point is that I don't live with the outliers, because I do eliminate the 40% slowest. And yet, there are runs that are systematically some % slower. Same code, same position, so there is absolutely no reason why some fild [esp] ... fistp myvar (for example) sequence should take 80x 77 cycles in the first run, 80x 83 in the second and 80x 77 again in the third run. In all three runs, there are a few outliers with 99 or even 200 cycles, but they are thrown out by sorting and eliminating the slower ones.

It's not logical.

I attach a new version that allows (by pressing y at the end) to export the sorted arrays of timings. Have a look at the "tails", but also at the start of each array.

In theory, the fastest should be the "true" result, because a cpu can't just drop some cycles if it feels in the right mood, right?

Sample top results (the 0  0 is a bug that I still have to fix):
Sorted timings for finding 'Duplicate' with crt strstr
0 0
1 2494306
2 2518611
3 2526306
Sorted timings for finding 'Duplicate' with crt strstr
0 0
1 2471897
2 2532000
3 2538188
Sorted timings for finding 'Duplicate' with crt strstr
0 0
1 2514656
2 2552965
3 2554821
Sorted timings for finding 'Duplicate' with crt strstr
0 0
1 2515663
2 2536230
3 2539026
Sorted timings for finding 'Duplicate' with crt strstr
0 0
1 2520319
2 2551161
3 2557537


Corresponding 'tails' (note #4, 10189333!):
198 4024483
199 4029707
200 4168568
2567 kCycles for finding 'Duplicate' with crt strstr
198 4039726
199 4358796
200 4849407
2568 kCycles for finding 'Duplicate' with crt strstr
198 3059861
199 3060097
200 3106652
2563 kCycles for finding 'Duplicate' with crt strstr
198 6868355
199 10076800
200 10189333
2588 kCycles for finding 'Duplicate' with crt strstr
198 4001836
199 4147888
200 4317080
2567 kCycles for finding 'Duplicate' with crt strstr
Title: Re: New timing macros
Post by: quarantined on May 17, 2022, 01:13:54 AM
What probably would be a good idea to do if someone had the time is to create a lookup table with the published cycle count (Intel, AMD, other?) for the myriad of instructions for the target cpu. Then iterate through all the instructions in the algorithm and calculate the total cycle count via the cycle count lookup table to find exactly what the cycle count 'should' be. Of course for short sequences can be done manually.
Will depend on exact processor, architecture, etc., of course. Should avoid branching in test algo of course.
Just a thought.

RDTSC problem? Processor problem? Glitch in the proverbial MATRIX? Who knows?

I don't have the Force, so cannot provide the Jedi Mind Tricks that Agner Fog had up his sleeves. I had given up on cycle count comparisons in favor of running a given algo multiple times and getting the average running time for speed comparisons. Usually produces satisfactory results for "is algo A faster than algo B..."
Title: Re: New timing macros
Post by: jj2007 on May 17, 2022, 01:57:31 AM
Quote from: quarantined on May 17, 2022, 01:13:54 AMa lookup table with the published cycle count

The times when this used to work are gone, see instruction pipelining (https://en.wikipedia.org/wiki/Instruction_pipelining). Many instructions can nowadays run in parallel, others have to wait until the previous one has completed. It's very messy.
Title: Re: New timing macros
Post by: jj2007 on May 17, 2022, 11:56:50 AM
Just for fun... size the window to see the distribution of cycle counts, it's hilarious (and I was kind: I cut off the slowest 20) :cool:

  CyCtStart
        fldpi
        fdiv FP4(0.1)
        fstp st
  CyCtEnd <fldpi+fdiv+fstp st>


  CyCtStart
        void Instr_("Hello World, this is a little test of MasmBasic's Instr macro", "macro")
  CyCtEnd <Instr>
Title: Re: New timing macros
Post by: quarantined on May 17, 2022, 02:56:02 PM
Quote from: jj2007 on May 17, 2022, 01:57:31 AM
The times when this used to work are gone, see instruction pipelining (https://en.wikipedia.org/wiki/Instruction_pipelining). Many instructions can nowadays run in parallel, others have to wait until the previous one has completed. It's very messy.

Well that really throws a spanner into the gears. I'll check out your latest when I'm back at my computer...
Title: Re: New timing macros
Post by: HSE on May 17, 2022, 09:44:36 PM
Hi JJ!

That look really wrong  :biggrin:

Could be interesting comparations with MichaelW's timers and GetTickCount.
Title: Re: New timing macros
Post by: jj2007 on May 17, 2022, 11:03:07 PM
Hi Hector,

In a console proggie, as attached, the new macros perform much better than MichaelW's \Masm32\macros\timers.asm, especially for somewhat longer tests:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
23       Cycles for some short instructions

18       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
188     kCycles for finding 'Duplicate' with Instr_
2503    kCycles for finding 'Duplicate' with InString
2571    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore

18       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
191     kCycles for finding 'Duplicate' with Instr_
2494    kCycles for finding 'Duplicate' with InString
2574    kCycles for finding 'Duplicate' with crt strstr
713     kCycles for finding 'Duplicate' with Boyer-Moore

19       Cycles for fldpi+1*fdiv+fstp st
19       Cycles for fldpi+1*fdiv+fstp st
188     kCycles for finding 'Duplicate' with Instr_
2493    kCycles for finding 'Duplicate' with InString
2565    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore

21       Cycles for fldpi+1*fdiv+fstp st
24       Cycles for fldpi+1*fdiv+fstp st
188     kCycles for finding 'Duplicate' with Instr_
2490    kCycles for finding 'Duplicate' with InString
2573    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore

24       Cycles for fldpi+1*fdiv+fstp st
19       Cycles for fldpi+1*fdiv+fstp st
188     kCycles for finding 'Duplicate' with Instr_
2486    kCycles for finding 'Duplicate' with InString
2573    kCycles for finding 'Duplicate' with crt strstr
711     kCycles for finding 'Duplicate' with Boyer-Moore


The interesting question is why that is so dramatically different in a GUI application :rolleyes:
Title: Re: New timing macros
Post by: HSE on May 17, 2022, 11:56:00 PM
Quote from: jj2007 on May 17, 2022, 11:03:07 PM
the new macros perform much better than MichaelW's \Masm32\macros\timers.asm, especially for somewhat longer tests:

Doesn't matter that :biggrin:
Just interesting to see how are fluctuations of other timings in same graphic.
Title: Re: New timing macros
Post by: jj2007 on May 18, 2022, 01:35:31 AM
See yourself... Michael's macros have a loop inside the rdtsc ... test code ... rdtsc sequence.
Title: Re: New timing macros
Post by: HSE on May 18, 2022, 02:09:37 AM
It's a lot more stable. What is stack?
Title: Re: New timing macros
Post by: jj2007 on May 18, 2022, 02:11:03 AM
stack equ dword ptr [esp]
stack[4] = first argument
etc
Title: Re: New timing macros
Post by: HSE on May 18, 2022, 02:16:40 AM
But must be edi-10
Title: Re: New timing macros
Post by: jj2007 on May 18, 2022, 02:19:01 AM
Correct :thumbsup:

(wow, somebody reads and understands my code :rolleyes:)
Title: Re: New timing macros
Post by: HSE on May 18, 2022, 02:35:21 AM
I can read my own code! (most of the time )
Title: Re: New timing macros
Post by: jj2007 on May 18, 2022, 06:57:51 AM
Hi Hector,

Here is a console program using mostly CyCt* macros plus one of the MichaelW counters:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
23       Cycles for some short instructions

19       Cycles for fldpi+1*fdiv+fstp st
19       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2483    kCycles for finding 'Duplicate' with InString
2551    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2508    kCycles for InString using MichaelW's macro


18       Cycles for fldpi+1*fdiv+fstp st
19       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2480    kCycles for finding 'Duplicate' with InString
2548    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2534    kCycles for InString using MichaelW's macro


19       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2480    kCycles for finding 'Duplicate' with InString
2545    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2506    kCycles for InString using MichaelW's macro


19       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2487    kCycles for finding 'Duplicate' with InString
2551    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2521    kCycles for InString using MichaelW's macro


19       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2483    kCycles for finding 'Duplicate' with InString
2549    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2510    kCycles for InString using MichaelW's macro

MichaelW, Instring: 2508 2534 2506 2521 2510 2534-2506=28
CyCt*, Instring: 2483 2480 2480 2487 2483 2487-2480=7


Run the attachment, and you will see that the new macros have another advantage :cool:
Title: Re: New timing macros
Post by: HSE on May 18, 2022, 08:42:04 AM
 :biggrin:

Intel(R) Core(TM) i3-10100 CPU @ 3.60GHz
42       Cycles for some short instructions

31       Cycles for fldpi+1*fdiv+fstp st
25       Cycles for fldpi+1*fdiv+fstp st
211     kCycles for finding 'Duplicate' with Instr_
2093    kCycles for finding 'Duplicate' with InString
1886    kCycles for finding 'Duplicate' with crt strstr
712     kCycles for finding 'Duplicate' with Boyer-Moore
2118    kCycles for InString using MichaelW's macro


34       Cycles for fldpi+1*fdiv+fstp st
26       Cycles for fldpi+1*fdiv+fstp st
209     kCycles for finding 'Duplicate' with Instr_
2064    kCycles for finding 'Duplicate' with InString
1880    kCycles for finding 'Duplicate' with crt strstr
696     kCycles for finding 'Duplicate' with Boyer-Moore
2114    kCycles for InString using MichaelW's macro


34       Cycles for fldpi+1*fdiv+fstp st
25       Cycles for fldpi+1*fdiv+fstp st
210     kCycles for finding 'Duplicate' with Instr_
2081    kCycles for finding 'Duplicate' with InString
1883    kCycles for finding 'Duplicate' with crt strstr
696     kCycles for finding 'Duplicate' with Boyer-Moore
2117    kCycles for InString using MichaelW's macro


37       Cycles for fldpi+1*fdiv+fstp st
24       Cycles for fldpi+1*fdiv+fstp st
210     kCycles for finding 'Duplicate' with Instr_
2088    kCycles for finding 'Duplicate' with InString
1926    kCycles for finding 'Duplicate' with crt strstr
713     kCycles for finding 'Duplicate' with Boyer-Moore
2119    kCycles for InString using MichaelW's macro


38       Cycles for fldpi+1*fdiv+fstp st
21       Cycles for fldpi+1*fdiv+fstp st
209     kCycles for finding 'Duplicate' with Instr_
2088    kCycles for finding 'Duplicate' with InString
1926    kCycles for finding 'Duplicate' with crt strstr
695     kCycles for finding 'Duplicate' with Boyer-Moore
2117    kCycles for InString using MichaelW's macro


MichaelW, Instring:    2119-2114= 5
CyCt*, Instring:       2093-2064=29
Title: Re: New timing macros
Post by: jj2007 on May 18, 2022, 08:47:28 AM
YMMV :cool:

But in general, I get fairly consistent output, and at much, much greater speed. Which is an argument when you are trying to optimise an algo and it takes you many small changes.
Title: Re: New timing macros
Post by: jj2007 on May 19, 2022, 07:29:53 PM
More refining done... it starts to look convincing :tongue:

Can I have some timings, please?

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
+0       Cycles for PI*100
+16      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+13      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+189     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+16      Cycles for PI*100/10
+4       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+16      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+187     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+16      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+13      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+187     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+16      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+13      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+187     Cycles for finding 'test' with InString
Title: Re: New timing macros
Post by: hutch-- on May 19, 2022, 08:26:05 PM
Could you civilize this a bit. Would not run where I downloaded it and had to be installed on my dev drive.

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+163     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
189     kCycles for finding 'Duplicate' in Window.inc
+145     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+145     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+147     Cycles for finding 'test' with InString
Title: Re: New timing macros
Post by: jj2007 on May 19, 2022, 08:45:24 PM
Here is the civilised version, with an Inkey at the end. The other one runs fine from a DOS prompt, of course  :biggrin:
Title: Re: New timing macros
Post by: hutch-- on May 19, 2022, 09:13:15 PM
This onje at least stops on the inkey but to run it, it must be copied to my dev drive. Would be a lot easier to run if I could just copy windows.inc to where it gets unzipped. I test on a ramdrive, not my dev drive.
Title: Re: New timing macros
Post by: jj2007 on May 19, 2022, 09:17:34 PM
Quote from: hutch-- on May 19, 2022, 09:13:15 PMif I could just copy windows.inc to where it gets unzipped

Now I understand the problem :biggrin:

I know one guy who is very proud about all paths in include files being hardcoded to the root as in \thatfolder\whatever :mrgreen:

Let esi=FileRead$("\Masm32\include\Windows.inc")

Attached a version with embedded Windows.inc, and some more timings. Don't complain that it's a fat attachment :cool:

  ; Let esi=FileRead$("\Masm32\include\Windows.inc")
  Let esi=FileRead$(99)   ; resource ID 99
Title: Re: New timing macros
Post by: FORTRANS on May 20, 2022, 05:01:29 AM
Hi,

   Two laptops.

Intel(R) Pentium(R) M processor 1.70GHz
+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+458 Cycles for finding 'test' with InString

+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+467 Cycles for finding 'test' with InString

+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+423 Cycles for finding 'test' with InString

+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+422 Cycles for finding 'test' with InString

more (y)?

+32 Cycles for fldpi+1*fdiv+fstp st
+3581 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3582 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3582 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3581 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3582 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

hit any key - bye


Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz
+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+175 Cycles for finding 'test' with InString

+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+174 Cycles for finding 'test' with InString

+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+174 Cycles for finding 'test' with InString

+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+175 Cycles for finding 'test' with InString

more (y)?

+33 Cycles for fldpi+1*fdiv+fstp st
+3522 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2418 kCycles for finding 'Duplicate' with InString
2265 kCycles for finding 'Duplicate' with crt strstr
826 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3108 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2419 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3108 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2418 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3108 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2419 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3109 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2419 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

hit any key - bye


Cheers,

Steve N.
Title: Re: New timing macros
Post by: jj2007 on May 20, 2022, 05:24:06 AM
Thanks, Steve, now it looks reasonably stable :thumbsup:

A typical usage example:

CyCtStart
fldpi
fmul FP4(100.0)
fistp testDD
CyCtEnd PI*100
CyCtStart
fldpi
fmul FP4(100.0)
fdiv FP4(10.0)
fistp testDD
CyCtEnd PI*100/10
CyCtStart
push eax
pop eax
CyCtEnd push & pop eax
Title: Re: New timing macros
Post by: TimoVJL on May 20, 2022, 08:27:07 AM
AMD Athlon(tm) II X2 220 Processor
+0       Cycles for PI*100
+17      Cycles for PI*100/10
+1       Cycles for push & pop eax
+0       Cycles for empty loop
+16      Cycles for 10 * inc & dec eax
+16      Cycles for 10 * add eax,1 & sub eax,1
+16      Cycles for 10 * inc+dec with lea
481     kCycles for finding 'Duplicate' in Window.inc
+471     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+17      Cycles for PI*100/10
+0       Cycles for push & pop eax
+0       Cycles for empty loop
+16      Cycles for 10 * inc & dec eax
+16      Cycles for 10 * add eax,1 & sub eax,1
+16      Cycles for 10 * inc+dec with lea
481     kCycles for finding 'Duplicate' in Window.inc
+476     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+17      Cycles for PI*100/10
+1       Cycles for push & pop eax
+0       Cycles for empty loop
+16      Cycles for 10 * inc & dec eax
+16      Cycles for 10 * add eax,1 & sub eax,1
+16      Cycles for 10 * inc+dec with lea
481     kCycles for finding 'Duplicate' in Window.inc
+477     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+17      Cycles for PI*100/10
+0       Cycles for push & pop eax
+0       Cycles for empty loop
+16      Cycles for 10 * inc & dec eax
+16      Cycles for 10 * add eax,1 & sub eax,1
+16      Cycles for 10 * inc+dec with lea
481     kCycles for finding 'Duplicate' in Window.inc
+477     Cycles for finding 'test' with InString

more (y)?

Title: Re: New timing macros
Post by: jj2007 on May 20, 2022, 09:27:26 AM
Thanks @everybody :thup:

I attach a new graphical version. Two interesting aspects:
- when sizing the window, the cycle count for ...
      fldpi
      fdiv FP4(0.1)
      fstp st
... is pretty stable at 18 cycles for my macros and 14 cycles for Michael Webster's macros. Why that difference? I can only guess that my "isolation" of the core code with nops all over the place place a role:

Address   Hex dump          Command                                  Comments
00401403   .  0FA2          .cpuid
00401405   .  0F31          .rdtsc
00401407   .  A3 F0A34000   .mov [CyCtrdtsc], eax
0040140C   .  8915 F4A34000 .mov [40A3F4], edx
00401412   .  8B1D E4A34000 .mov ebx, [CyCtinner]
00401418   .  90            .nop
00401419   .  90            .nop
0040141A   .  90            .nop
0040141B   .  90            .nop
0040141C   .  90            .nop
0040141D   .  90            .nop
0040141E   .  90            .nop
0040141F   .  90            .nop
00401420   >  D9EB          .fldpi                                   ; loop start ---------------------
00401422   .  D835 68904000 .fdiv dword ptr [??0232]                 ; float 0.1000000
00401428   .  DDD8          .fstp st
0040142A   .  90            .nop
0040142B   .  90            .nop
0040142C   .  90            .nop
0040142D   .  90            .nop
0040142E   .  90            .nop
0040142F   .  90            .nop
00401430   .  90            .nop
00401431   .  90            .nop
00401432   .  4B            .dec ebx
00401433   .^ 79 EB         .jns short CyCtcode2_s                   ; jump back ---------------------
00401435   .  50            .push eax
00401436   .  51            .push ecx
00401437   .  52            .push edx
00401438   .  33C0          .xor eax, eax
0040143A   .  0FA2          .cpuid
0040143C   .  0F31          .rdtsc
Title: Re: New timing macros
Post by: HSE on May 20, 2022, 09:55:41 AM
 :biggrin:

Perhaps is a bad idea to take the timing at same time that paint (you are not calling for High Priority¿?):
Title: Re: New timing macros
Post by: jj2007 on May 20, 2022, 10:02:13 AM
Quote from: HSE on May 20, 2022, 09:55:41 AM
Perhaps is a bad idea to take the timing at same time that paint.:

Well, it's not taken during the WM_PAINT processing. But apparently there is an influence. When quickly sizing the window (lots of painting), the cycle counts go down, though. I have absolutely no clue why that is so.

Anyway, for timing your algo you would use a console program. The GUI version is for illustrative purposes, i.e. for showing the skewed distribution of results.

To build the source attached above, you need the latest MasmBasic version of 20 May 2022. (http://masm32.com/board/index.php?topic=94.0)
Title: Re: New timing macros
Post by: hutch-- on May 20, 2022, 10:48:51 AM
mOST RECENT VERSION !

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
187     kCycles for finding 'Duplicate' in Window.inc
+145     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
187     kCycles for finding 'Duplicate' in Window.inc
+144     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
187     kCycles for finding 'Duplicate' in Window.inc
+144     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
187     kCycles for finding 'Duplicate' in Window.inc
+143     Cycles for finding 'test' with InString

more (y)?


What is the "more (y)?" for ?  :tongue:
Title: Re: New timing macros
Post by: jj2007 on May 20, 2022, 06:29:08 PM
Quote from: hutch-- on May 20, 2022, 10:48:51 AM
What is the "more (y)?" for ?  :tongue:

Be courageous, type y :biggrin:
Title: Re: New timing macros
Post by: hutch-- on May 20, 2022, 06:43:48 PM
I did, nothing happened apart from exiting, is that what the "y" is for ?  :tongue:
Title: Re: New timing macros
Post by: jj2007 on May 20, 2022, 07:01:24 PM
Strange, I get this from reply #27 (http://masm32.com/board/index.php?topic=10060.msg110088#msg110088):

more (y)?

+73      Cycles for fldpi+1*fdiv+fstp st
+12     kCycles for finding 'test' with Instr_
328     kCycles for finding 'Duplicate' with Instr_
2492    kCycles for finding 'Duplicate' with InString
2553    kCycles for finding 'Duplicate' with crt strstr
720     kCycles for finding 'Duplicate' with Boyer-Moore

+16      Cycles for fldpi+1*fdiv+fstp st
+3286    Cycles for finding 'test' with Instr_
192     kCycles for finding 'Duplicate' with Instr_
2496    kCycles for finding 'Duplicate' with InString
2560    kCycles for finding 'Duplicate' with crt strstr
720     kCycles for finding 'Duplicate' with Boyer-Moore

+16      Cycles for fldpi+1*fdiv+fstp st
+3286    Cycles for finding 'test' with Instr_
192     kCycles for finding 'Duplicate' with Instr_
2494    kCycles for finding 'Duplicate' with InString
2560    kCycles for finding 'Duplicate' with crt strstr
719     kCycles for finding 'Duplicate' with Boyer-Moore

+17      Cycles for fldpi+1*fdiv+fstp st
+3309    Cycles for finding 'test' with Instr_
192     kCycles for finding 'Duplicate' with Instr_
2493    kCycles for finding 'Duplicate' with InString
2555    kCycles for finding 'Duplicate' with crt strstr
720     kCycles for finding 'Duplicate' with Boyer-Moore

+16      Cycles for fldpi+1*fdiv+fstp st
+3313    Cycles for finding 'test' with Instr_
192     kCycles for finding 'Duplicate' with Instr_
2491    kCycles for finding 'Duplicate' with InString
2558    kCycles for finding 'Duplicate' with crt strstr
720     kCycles for finding 'Duplicate' with Boyer-Moore

hit any key
Title: Re: New timing macros
Post by: felipe on May 24, 2022, 04:01:59 AM
Hi all, long time since my last reply/post in the forum  :sad:. One question please: generally speaking, what would be more accurate to test performance in a windows program, counting cycles or counting time (with QueryPerformanceCounter)?
Title: Re: New timing macros
Post by: daydreamer on May 24, 2022, 04:57:15 AM
Quote from: felipe on May 24, 2022, 04:01:59 AM
Hi all, long time since my last reply/post in the forum  :sad:. One question please: generally speaking, what would be more accurate to test performance in a windows program, counting cycles or counting time (with QueryPerformanceCounter)?
Welcome back felipe  :thumbsup:
Counting cycles,but invoke gettimemillis before and after easier to compare between your asm program and other languages timings for example very fast, huge fibonnaci calculations with matrices
http://masm32.com/board/index.php?topic=9773.0 (http://masm32.com/board/index.php?topic=9773.0)

Title: Re: New timing macros
Post by: jj2007 on May 24, 2022, 06:39:49 AM
Quote from: felipe on May 24, 2022, 04:01:59 AMcounting cycles or counting time

Time varies with cpu speed, cycles don't.

Quote from: daydreamer on May 24, 2022, 04:57:15 AMinvoke gettimemillis before and after

Interesting, can you post an example using gettimemillis?
Title: Re: New timing macros
Post by: felipe on May 24, 2022, 07:24:07 AM
This is an interesting topic. Let me put a quote from a page of intel about clock cycles to help a little bit to new ones (like my self  :toothy:):

QuoteSometimes, multiple instructions are completed in a single clock cycle; in other cases, one instruction might be handled over multiple clock cycles. Since different CPU designs handle instructions differently, it's best to compare clock speeds within the same CPU brand and generation.

For example, a CPU with a higher clock speed from five years ago might be outperformed by a new CPU with a lower clock speed, as the newer architecture deals with instructions more efficiently. An X-series Intel® processor might outperform a K-series processor with a higher clock speed, because it splits tasks between more cores and features a larger CPU cache. But within the same generation of CPUs, a processor with a higher clock speed will generally outperform a processor with a lower clock speed across many applications. This is why it's important to compare processors from the same brand and generation.

https://www.intel.com/content/www/us/en/gaming/resources/cpu-clock-speed.html (https://www.intel.com/content/www/us/en/gaming/resources/cpu-clock-speed.html)
Title: Re: New timing macros
Post by: hutch-- on May 24, 2022, 07:37:19 AM
Hi felipe,

Don't be lead astray, the only reliable timing technique is real time[tm]. Any recent CPU does many things that effect trying to perform close range timings, variable frequency depending on load, OS ring0 over-rides, core sharing, CPU time slicing, CPU hardware differences etc etc etc ....

Timing is at best unreliable due to these factors and the best I have come up with over many years is to design each test to match the algo's usage. Run the test for long enough, (over 1 second usually does the job). With the choice of attack speed versus sustained data transfer and everything inbetween.

Short duration tests run many times versus long duration tests for absolute transfer speed.
Title: Re: New timing macros
Post by: felipe on May 24, 2022, 07:59:48 AM
I share that vision Hutch, i mean, if you can see the program running fast then you finished the job. If you want your program to run fast on multiple cpu's, you test the program in all the different cpu's you can. If you have the task of constructing the fastest program in the world, you will still need to face the reallity that the program will run different in two different cpu's.  :icon_idea:

Although i recognize that counting cycles, i mean, bringing out strange numbers from this electronic world, can be fun... :mrgreen:
Title: Re: New timing macros
Post by: quarantined on May 24, 2022, 08:41:34 AM
off topic == on

Quote from: felipe on May 24, 2022, 04:01:59 AM
Hi all, long time since my last reply/post in the forum

Well, hello. I haven't seen you around here since forever. How are you?

off topic == off. Oops
Title: Re: New timing macros
Post by: jj2007 on May 24, 2022, 08:51:52 AM
Quote from: felipe on May 24, 2022, 07:59:48 AMif you can see the program running fast then you finished the job.

Sure, but between starting the work and finishing, there may be weeks or years. In the meantime, you might want to know which instructions are the fastest in the 100+ innermost loops of your program. For example, are inc, add and lea any different? Should you prefer one of them? Does one of them really suck on AMD cpus, or on the cheap Celerons that your company uses?

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
once:
0        Cycles for inc eax
0        Cycles for add eax,1
0        Cycles for lea eax,[eax+1]
0        Cycles for inc eax
0        Cycles for add eax,1
0        Cycles for lea eax,[eax+1]
0        Cycles for inc eax
0        Cycles for add eax,1
0        Cycles for lea eax,[eax+1]
0        Cycles for inc eax
0        Cycles for add eax,1
0        Cycles for lea eax,[eax+1]

100x:
87       Cycles for 100*inc eax
96       Cycles for 100*add eax,1
84       Cycles for 100*lea eax,[eax+1]
84       Cycles for 100*inc eax
81       Cycles for 100*add eax,1
83       Cycles for 100*lea eax,[eax+1]
84       Cycles for 100*inc eax
85       Cycles for 100*add eax,1
84       Cycles for 100*lea eax,[eax+1]
84       Cycles for 100*inc eax
78       Cycles for 100*add eax,1
84       Cycles for 100*lea eax,[eax+1]
1 bytes for inc eax
3 bytes for add eax, 1
3 bytes for lea eax, [eax+1]
100 bytes for 100*inc eax
300 bytes for 100*add eax, 1
300 bytes for 100*lea eax, [eax+1]
Title: Re: New timing macros
Post by: felipe on May 24, 2022, 09:40:44 AM
Actually that's a good point jj. You will need to test during the development. How you test performance specifically i suppose will always be a function (at least) of the type of program you are doing.