News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Timing anomalies.

Started by hutch--, April 22, 2015, 01:50:28 AM

Previous topic - Next topic

hutch--

This was a simple test on a task to count ascii 13 in a text file to try and improve the speed of a tokeniser. I did not get any real improvement in the second version but found a weird anomaly in the timing method. I put the 2 algos into library modules to try and normalise the timings but found the timings at least as quirky as having the code locally in the main module. I tried a simple experiment of changing the order of the two prototypes and produced 2 different sets of timings.

With the 2 prototypes in this order,

    cr_cnt2 PROTO :DWORD
    cr_cnt PROTO :DWORD

I get this timings between the two algos.

889 ms 1
843 ms 2
874 ms 1
827 ms 2
874 ms 1
827 ms 2
874 ms 1
843 ms 2
Press any key to continue ...


Change the order to,

    cr_cnt PROTO :DWORD
    cr_cnt2 PROTO :DWORD

And you get this timing.

1076 ms 1
842 ms 2
1076 ms 1
843 ms 2
1077 ms 1
842 ms 2
1076 ms 1
826 ms 2
Press any key to continue ...


dedndave

that makes no sense at all - something else must be different between the 2 runs

dedndave

crashes right away under xp-32

dedndave

even so.....

you should be able to change the order of the prototypes and create the exact same EXE
compare the 2 EXE's - shouldn't be any difference

hutch--

Dave,

Just rebuild the code on XP then run it. The test piece on my Win7 64 was built with ML version 9.0, I just copied the entire directory to an XP box, rebuilt it with ML 6.14 and it worked correctly.

dedndave

ok....

looks like the assembler pulls them in in the order prototyped
the old "code location sensitivity" issue - good to be aware of   :biggrin:

Comparing files cz1.exe and CZ2.EXE
000000C8: 96 CF
000001C0: 42 4A
00000464: B8 68
00000465: 03 02
000004B1: 1B 4B
000006E2: 0F 74
000006E3: 84 06
000006E4: 27 85
000006E5: 01 D2
000006E6: 00 75
000006E7: 00 F1
000006E8: 85 EB
000006E9: D2 05
000006EA: 0F 83
000006EB: 84 C0
000006EC: 27 01
000006ED: 01 EB
000006EE: 00 EA
000006EF: 00 C2
000006F0: 83 04
000006F1: C1 00
000006F2: 01 CC
000006F3: 0F CC
000006F4: B6 CC
000006F5: 11 CC
000006F6: 83 CC
000006F7: FA CC
000006F8: 0D CC
000006F9: 0F CC
000006FA: 84 CC
000006FB: 10 CC
000006FC: 01 CC
000006FD: 00 CC
000006FE: 00 CC
000006FF: 85 CC
00000700: D2 8B
00000701: 0F 4C
00000702: 84 24
00000703: 10 04
00000704: 01 33
00000705: 00 C0
00000706: 00 83
00000707: 83 E9
00000708: C1 01
00000709: 01 83
0000070A: 0F C1
0000070B: B6 01
0000070C: 11 0F
0000070D: 83 B6
0000070E: FA 11
0000070F: 0D 83
00000710: 0F FA
00000711: 84 0D
00000712: F9 0F
00000713: 00 84
00000714: 00 27
00000715: 00 01
00000716: 85 00
00000717: D2 00
00000718: 0F 85
00000719: 84 D2
0000071A: F9 0F
0000071B: 00 84
0000071C: 00 27
0000071D: 00 01
0000071E: 83 00
0000071F: C1 00
00000720: 01 83
00000721: 0F C1
00000722: B6 01
00000723: 11 0F
00000724: 83 B6
00000725: FA 11
00000726: 0D 83
00000727: 0F FA
00000728: 84 0D
00000729: E2 0F
0000072A: 00 84
0000072B: 00 10
0000072C: 00 01
0000072D: 85 00
0000072E: D2 00
0000072F: 0F 85
00000730: 84 D2
00000731: E2 0F
00000732: 00 84
00000733: 00 10
00000734: 00 01
00000735: 83 00
00000736: C1 00
00000737: 01 83
00000738: 0F C1
00000739: B6 01
0000073A: 11 0F
0000073B: 83 B6
0000073C: FA 11
0000073D: 0D 83
0000073E: 0F FA
0000073F: 84 0D
00000740: CB 0F
00000741: 00 84
00000742: 00 F9
00000744: 85 00
00000745: D2 00
00000746: 0F 85
00000747: 84 D2
00000748: CB 0F
00000749: 00 84
0000074A: 00 F9
0000074C: 83 00
0000074D: C1 00
0000074E: 01 83
0000074F: 0F C1
00000750: B6 01
00000751: 11 0F
00000752: 83 B6
00000753: FA 11
00000754: 0D 83
00000755: 0F FA
00000756: 84 0D
00000757: B4 0F
00000758: 00 84
00000759: 00 E2
0000075B: 85 00
0000075C: D2 00
0000075D: 0F 85
0000075E: 84 D2
0000075F: B4 0F
00000760: 00 84
00000761: 00 E2
00000763: 83 00
00000764: C1 00
00000765: 01 83
00000766: 0F C1
00000767: B6 01
00000768: 11 0F
00000769: 83 B6
0000076A: FA 11
0000076B: 0D 83
0000076C: 0F FA
0000076D: 84 0D
0000076E: 9D 0F
0000076F: 00 84
00000770: 00 CB
00000772: 85 00
00000773: D2 00
00000774: 0F 85
00000775: 84 D2
00000776: 9D 0F
00000777: 00 84
00000778: 00 CB
0000077A: 83 00
0000077B: C1 00
0000077C: 01 83
0000077D: 0F C1
0000077E: B6 01
0000077F: 11 0F
00000780: 83 B6
00000781: FA 11
00000782: 0D 83
00000783: 0F FA
00000784: 84 0D
00000785: 86 0F
00000786: 00 84
00000787: 00 B4
00000789: 85 00
0000078A: D2 00
0000078B: 0F 85
0000078C: 84 D2
0000078D: 86 0F
0000078E: 00 84
0000078F: 00 B4
00000791: 83 00
00000792: C1 00
00000793: 01 83
00000794: 0F C1
00000795: B6 01
00000796: 11 0F
00000797: 83 B6
00000798: FA 11
00000799: 0D 83
0000079A: 74 FA
0000079B: 73 0D
0000079C: 85 0F
0000079D: D2 84
0000079E: 74 9D
0000079F: 77 00
000007A0: 83 00
000007A1: C1 00
000007A2: 01 85
000007A3: 0F D2
000007A4: B6 0F
000007A5: 11 84
000007A6: 83 9D
000007A7: FA 00
000007A8: 0D 00
000007A9: 74 00
000007AA: 64 83
000007AB: 85 C1
000007AC: D2 01
000007AD: 74 0F
000007AE: 68 B6
000007AF: 83 11
000007B0: C1 83
000007B1: 01 FA
000007B2: 0F 0D
000007B3: B6 0F
000007B4: 11 84
000007B5: 83 86
000007B6: FA 00
000007B7: 0D 00
000007B8: 74 00
000007B9: 55 85
000007BA: 85 D2
000007BB: D2 0F
000007BC: 74 84
000007BD: 59 86
000007BE: 83 00
000007BF: C1 00
000007C0: 01 00
000007C1: 0F 83
000007C2: B6 C1
000007C3: 11 01
000007C4: 83 0F
000007C5: FA B6
000007C6: 0D 11
000007C7: 74 83
000007C8: 46 FA
000007C9: 85 0D
000007CA: D2 74
000007CB: 74 73
000007CC: 4A 85
000007CD: 83 D2
000007CE: C1 74
000007CF: 01 77
000007D0: 0F 83
000007D1: B6 C1
000007D2: 11 01
000007D3: 83 0F
000007D4: FA B6
000007D5: 0D 11
000007D6: 74 83
000007D7: 37 FA
000007D8: 85 0D
000007D9: D2 74
000007DA: 74 64
000007DB: 3B 85
000007DC: 83 D2
000007DD: C1 74
000007DE: 01 68
000007DF: 0F 83
000007E0: B6 C1
000007E1: 11 01
000007E2: 83 0F
000007E3: FA B6
000007E4: 0D 11
000007E5: 74 83
000007E6: 28 FA
000007E7: 85 0D
000007E8: D2 74
000007E9: 74 55
000007EA: 2C 85
000007EB: 83 D2
000007EC: C1 74
000007ED: 01 59
000007EE: 0F 83
000007EF: B6 C1
000007F0: 11 01
000007F1: 83 0F
000007F2: FA B6
000007F3: 0D 11
000007F4: 74 83
000007F5: 19 FA
000007F6: 85 0D
000007F7: D2 74
000007F8: 74 46
000007F9: 1D 85
000007FA: 83 D2
000007FB: C1 74
000007FC: 01 4A
000007FD: 0F 83
000007FE: B6 C1
000007FF: 11 01
00000800: 83 0F
00000801: FA B6
00000802: 0D 11
00000803: 74 83
00000804: 0A FA
00000805: 85 0D
00000806: D2 74
00000807: 0F 37
00000809: CC D2
0000080A: FE 74
0000080B: FF 3B
0000080C: FF 83
0000080D: EB C1
0000080E: 08 01
0000080F: 83 0F
00000810: C0 B6
00000811: 01 11
00000812: E9 83
00000813: C2 FA
00000814: FE 0D
00000815: FF 74
00000816: FF 28
00000817: C2 85
00000818: 04 D2
00000819: 00 74
0000081A: CC 2C
0000081B: CC 83
0000081C: CC C1
0000081D: CC 01
0000081E: CC 0F
0000081F: CC B6
00000820: 8B 11
00000821: 4C 83
00000822: 24 FA
00000823: 04 0D
00000824: 33 74
00000825: C0 19
00000826: 83 85
00000827: E9 D2
00000828: 01 74
00000829: 83 1D
0000082A: C1 83
0000082B: 01 C1
0000082C: 0F 01
0000082D: B6 0F
0000082E: 11 B6
0000082F: 83 11
00000830: FA 83
00000831: 0D FA
00000832: 74 0D
00000833: 06 74
00000834: 85 0A
00000835: D2 85
00000836: 75 D2
00000837: F1 0F
00000838: EB 85
00000839: 05 CC
0000083A: 83 FE
0000083B: C0 FF
0000083C: 01 FF
0000083E: EA 08
0000083F: C2 83
00000840: 04 C0
00000841: 00 01
00000842: 00 E9
00000843: 00 C2
00000844: 00 FE
00000845: 00 FF
00000846: 00 FF
00000847: 00 C2
00000848: 00 04

Gunther

Hi Hutch,

cz.exe from post #1 doesn't work under Win 7-64.

Gunther
You have to know the facts before you can distort them.

jj2007

cz.exe on i5, Win7-64:

842 ms 1
826 ms 2
873 ms 1
827 ms 2
889 ms 1
827 ms 2
905 ms 1
842 ms 2

Siekmanski

Here it's just the other way around, Windows 8.1

    cr_cnt2 PROTO :DWORD
    cr_cnt PROTO :DWORD

1140 ms 1
844 ms 2
1141 ms 1
844 ms 2
1157 ms 1
844 ms 2
1140 ms 1
844 ms 2

    cr_cnt PROTO :DWORD
    cr_cnt2 PROTO :DWORD

891 ms 1
828 ms 2
906 ms 1
859 ms 2
890 ms 1
860 ms 2
906 ms 1
859 ms 2


Maybe L1 cache alignment ?
Creative coders use backward thinking techniques as a strategy.

anunitu

You of course have to allow for the NSA's read time,and you know they code slow anyway.... :icon_rolleyes: :shock: