Hi!
Benchmark threads are always fun!
This one compares 6 slight variations of the same thing.
The code is self explanatory and simple.
Please run more than one time.
(credits: MultiTimers.asm is a Marinus library, I changed it a bit for my needs)
Any input == very welcomed!
Below some runs, win10 64 bits, with mixed results:run #1:
Test A: 11164.696405
Test A2: 10969.563025
Test B1: 11163.659643
Test B2: 7418.603591 <=
Test B3: 8528.724294
Test B4: 7515.563650
run #2:
Test A: 7426.077177
Test A2: 7309.214785
Test B1: 7353.330968
Test B2: 5535.438268
Test B3: 6036.998152
Test B4: 5116.531635 <=
run #3:
Test A: 6920.856352
Test A2: 6881.980504
Test B1: 6897.736518
Test B2: 6877.552658
Test B3: 7867.744807
Test B4: 6859.148761 <=
run #4:
Test A: 5252.094767
Test A2: 5409.126498
Test B1: 5227.336097
Test B2: 5169.841561 <=
Test B3: 5940.916224
Test B4: 5178.427761
run #5:
Test A: 7407.886978
Test A2: 7352.514103
Test B1: 7413.644783
Test B2: 7349.053366 <=
Test B3: 8465.262267
Test B4: 7550.733337
Performance test begins:
Test A: 16.096475
Test A2: 15.952794
Test B1: 16.059118
Test B2: 16.068971
Test B3: 14.664583
Test B4: 12.828328
Performance test begins:
Test A: 16.499195
Test A2: 16.025456
Test B1: 16.015603
Test B2: 16.436386
Test B3: 15.005725
Test B4: 16.929009
Haswell E/EP
Performance test begins:
Test A: 6342.841800
Test A2: 6264.233800
Test B1: 6189.494200
Test B2: 6232.060000
Test B3: 5614.326400
Test B4: 5057.885700
Press any key to continue ...
Win 8.1 i7-4930K
Performance test begins:
Test A: 20.877793
Test A2: 25.706060
Test B1: 13.619329
Test B2: 13.611995
Test B3: 12.385303
Test B4: 11.139614
This is strange results. I ran it eight times and here is what I got in this order-
i7-6700K @ 4.00GHz 16GB ram
Performance test begins:
Test A: 6347.129157
Test A2: 6281.266327
Test B1: 6281.467653
Test B2: 6276.904104
Test B3: 7174.972835
Test B4: 6278.010372
Press any key to continue ...
Performance test begins:
Test A: 6969.436843
Test A2: 6867.588864
Test B1: 6943.325842
Test B2: 7140.264365
Test B3: 7801.162425
Test B4: 6823.255602
Press any key to continue ...
Performance test begins:
Test A: 131.221323
Test A2: 129.574439
Test B1: 130.243055
Test B2: 131.238441
Test B3: 147.944116
Test B4: 130.590775
Press any key to continue ...
Performance test begins:
Test A: 6452.746624
Test A2: 6428.123592
Test B1: 6430.699946
Test B2: 6425.102177
Test B3: 7346.866769
Test B4: 6427.041340
Press any key to continue ...
Performance test begins:
Test A: 6766.527791
Test A2: 6820.317221
Test B1: 6823.176912
Test B2: 6828.999001
Test B3: 7792.153619
Test B4: 6830.869949
Press any key to continue ...
Performance test begins:
Test A: 1467.332506
Test A2: 1456.524033
Test B1: 1455.340607
Test B2: 1457.027347
Test B3: 1672.604833
Test B4: 1455.453789
Press any key to continue ...
Performance test begins:
Test A: 4020.751092
Test A2: 4014.235095
Test B1: 4014.843159
Test B2: 4011.761959
Test B3: 4593.479812
Test B4: 4017.827018
Press any key to continue ...
Performance test begins:
Test A: 2292.837180
Test A2: 2293.784790
Test B1: 2291.060508
Test B2: 2292.725020
Test B3: 2616.233174
Test B4: 2291.601123
Press any key to continue ...
There is something wrong with the tests.
Too much variance.
Quote from: AW on November 08, 2018, 05:51:22 PM
There is something wrong with the tests.
Too much variance.
Hi Jose, you are right! I've noticed this a bit after i posted but didn't have time to correct it.
Simple: Marinus routine was trashing ecx, which is used for the loop. I invert the order and everything is ok now. I'm "printfing" esi along with the results, only to make sure the values match.
Also, I introduced a "testC", where I replace the memory "cmp" with a bitmask for a single dd. I was actually surprise this method isn't much faster than test B4 (in many runs B4 actually wins)
testA == .IF macro
testB == Asm cmp
testC == Asm cmp with bit mask
Zip file attached here (I also changed it in the OP)
Performance test begins:
Test A: 5375.086564 esi: -589934592
Test A2: 5349.759736 esi: -589934592
Test B1: 5545.249401 esi: -589934592
Test B2: 3735.930867 esi: -589934592
Test B3: 4284.832457 esi: -589934592
Test B4: 3749.020400 esi: -589934592
Test C: 3740.795590 esi: -589934592 <==
Test C2: 4284.017780 esi: -589934592
Performance test begins:
Test A: 5392.448228 esi: -589934592
Test A2: 5430.023656 esi: -589934592
Test B1: 5407.402329 esi: -589934592
Test B2: 3760.736944 esi: -589934592
Test B3: 4279.407962 esi: -589934592
Test B4: 3738.908048 esi: -589934592 <==
Test C: 3752.077080 esi: -589934592
Test C2: 4292.570795 esi: -589934592
Performance test begins:
Test A: 5651.308023 esi: -589934592
Test A2: 5824.626002 esi: -589934592
Test B1: 5341.540761 esi: -589934592
Test B2: 3734.443370 esi: -589934592
Test B3: 4272.311083 esi: -589934592
Test B4: 3755.025452 esi: -589934592
Test C: 3742.148158 esi: -589934592 <==
Test C2: 4265.206180 esi: -589934592
Haswell E/EP
Performance test begins:
Test A: 9534.766500 esi: -589934592
Test A2: 9507.666400 esi: -589934592
Test B1: 9375.177200 esi: -589934592
Test B2: 10077.460000 esi: -589934592
Test B3: 8395.685400 esi: -589934592
Test B4: 8371.843400 esi: -589934592
Test C: 8674.787200 esi: -589934592
Test C2: 8812.217000 esi: -589934592
Press any key to continue ...
It is better now. :t
i7-8700K @3.70GHz
Performance test begins:
Test A: 4630.867087 esi: -589934592
Test A2: 4605.566564 esi: -589934592
Test B1: 4592.344042 esi: -589934592
Test B2: 3218.725584 esi: -589934592
Test B3: 3700.482799 esi: -589934592
Test B4: 3209.868642 esi: -589934592
Test C: 3214.837650 esi: -589934592
Test C2: 3694.835833 esi: -589934592
Press any key to continue ...
i7-4810mq
Performance test begins:
Test A: 5785.226788 esi: -589934592
Test A2: 5919.552373 esi: -589934592
Test B1: 5890.797644 esi: -589934592
Test B2: 6444.327186 esi: -589934592
Test B3: 5861.268009 esi: -589934592
Test B4: 5852.370886 esi: -589934592
Test C: 5294.469393 esi: -589934592
Test C2: 4645.750682 esi: -589934592
Press any key to continue ...
Performance test begins:
Test A: 5928.559356 esi: -589934592
Test A2: 5968.172701 esi: -589934592
Test B1: 6011.132141 esi: -589934592
Test B2: 6515.751723 esi: -589934592
Test B3: 5909.522649 esi: -589934592
Test B4: 5915.338751 esi: -589934592
Test C: 5960.283151 esi: -589934592
Test C2: 4739.045465 esi: -589934592
Press any key to continue ...
F:\TEMP\TEST>testbed
Performance test begins:
Test A: 30718.174796 esi: -589934592
Test A2: 38910.549805 esi: -589934592
Test B1: 35904.093118 esi: -589934592
Test B2: 35919.995418 esi: -589934592
Test B3: 33645.956399 esi: -589934592
Test B4: 32922.090657 esi: -589934592
Test C: 29274.593838 esi: -589934592
Test C2: 34299.185790 esi: -589934592
Press any key to continue ...
Hi,
I managed to consistently beat the previous times, by using jecxz (the jcxz algo is slightly slower, in my machine). This algo is winning in every run.
testD == jcxz / jecxz
Performance test begins:
Test A: 3782.529372 esi: -589934592
Test A2: 3851.802446 esi: -589934592
Test B1: 3759.764728 esi: -589934592
Test B2: 3870.786900 esi: -589934592
Test B3: 4400.107068 esi: -589934592
Test B4: 3789.277990 esi: -589934592
Test C: 3927.359910 esi: -589934592
Test C2: 4303.804513 esi: -589934592
Test D: 2968.206445 esi: -589934592
Test D2: 2552.859739 esi: -589934592 <==
Performance test begins:
Test A: 3797.827358 esi: -589934592
Test A2: 3845.992493 esi: -589934592
Test B1: 3785.149904 esi: -589934592
Test B2: 3762.598958 esi: -589934592
Test B3: 4386.555860 esi: -589934592
Test B4: 3757.113564 esi: -589934592
Test C: 3922.216578 esi: -589934592
Test C2: 4299.821201 esi: -589934592
Test D: 2906.951777 esi: -589934592
Test D2: 2505.042136 esi: -589934592 <==
Performance test begins:
Test A: 6950.205613 esi: -589934592
Test A2: 6839.243535 esi: -589934592
Test B1: 6901.606908 esi: -589934592
Test B2: 6285.683585 esi: -589934592
Test B3: 5846.524628 esi: -589934592
Test B4: 4973.496701 esi: -589934592
Test C: 5590.548869 esi: -589934592
Test C2: 6348.380381 esi: -589934592
Test D: 3791.305599 esi: -589934592
Test D2: 3245.861136 esi: -589934592
Press any key to continue ...
Performance test begins:
Test A: 6885.879560 esi: -589934592
Test A2: 6869.180720 esi: -589934592
Test B1: 6890.368748 esi: -589934592
Test B2: 6103.885550 esi: -589934592
Test B3: 5580.385077 esi: -589934592
Test B4: 4904.580051 esi: -589934592
Test C: 5620.213463 esi: -589934592
Test C2: 6236.643973 esi: -589934592
Test D: 3740.016888 esi: -589934592
Test D2: 3049.097511 esi: -589934592
Press any key to continue ...
QuotePerformance test begins:
Test A: 7568.845361 esi: -589934592
Test A2: 7395.525548 esi: -589934592
Test B1: 7401.951468 esi: -589934592
Test B2: 6732.240622 esi: -589934592
Test B3: 6061.375539 esi: -589934592
Test B4: 5388.420513 esi: -589934592
Test C: 6074.134579 esi: -589934592
Test C2: 6076.328869 esi: -589934592
Test D: 3519.020742 esi: -589934592
Test D2: 3055.109561 esi: -589934592
Press any key to continue ...
D2 is a clear winner...
Anyone find a way to beat D2?
Jose just posted a spin off thread I am very curious to check out. But I am keeping this one in parallel