The attachment contains a test of the POPCNT instruction against the sort of code previously used for the purpose.
The very repeatable results running on my Core-i3 G3220:
9 cycles, popcnt*10
36 cycles, popcnt1*10
11 cycles, popcnt2*10
9 cycles, popcnt*10
36 cycles, popcnt1*10
11 cycles, popcnt2*10
9 cycles, popcnt*10
36 cycles, popcnt1*10
11 cycles, popcnt2*10
For timing single instructions, and for recent processors even short sequences of instructions, the one-cycle resolution of the TSC presents a problem. It's difficult to know what constitutes a representative test.
BTW, I assembled with JWASM.
For comparison, PopCount() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1029) on a CPU that doesn't support the native popcnt
Intel(R) Celeron(R) M CPU
76 cycles, popcntMB*10
91 cycles, popcnt1*10
22 cycles, popcnt2*10
Hi Michael,
here are your timings:
Quote
8 cycles, popcnt*10
40 cycles, popcnt1*10
13 cycles, popcnt2*10
8 cycles, popcnt*10
40 cycles, popcnt1*10
12 cycles, popcnt2*10
8 cycles, popcnt*10
40 cycles, popcnt1*10
12 cycles, popcnt2*10
Press any key to continue ...
Gunther
i7-4930K CPU @ 3.40GHz
win 8.1 x64
Quote
8 cycles, popcnt*10
45 cycles, popcnt1*10
13 cycles, popcnt2*10
9 cycles, popcnt*10
45 cycles, popcnt1*10
13 cycles, popcnt2*10
9 cycles, popcnt*10
45 cycles, popcnt1*10
14 cycles, popcnt2*10
Press any key to continue ...
This is the result on my i7 64 bit box.
4 cycles, popcnt*10
58 cycles, popcnt1*10
14 cycles, popcnt2*10
4 cycles, popcnt*10
58 cycles, popcnt1*10
14 cycles, popcnt2*10
4 cycles, popcnt*10
58 cycles, popcnt1*10
14 cycles, popcnt2*10
Press any key to continue ...
Not an easy choice:
- "true" popcnt excludes many older CPUs
- popcnt1 is kind of slow
- popcnt2 adds 256k of bloat
Attached the code I used for the test above.
There is a 2010 thread here (http://www.masmforum.com/board/index.php?topic=11061.msg128952#msg128952) with code that uses this popcnt implementation by Alex/Antariy:
@popcnt:
movzx ecx,al
shr eax,8
imul ecx,08040201h
imul eax,08040201h
shr ecx,3
shr eax,3
and ecx,11111111h
and eax,11111111h
imul ecx,11111111h
imul eax,11111111h
shr ecx,28
shr eax,28
add eax, ecx ; <<< not in the source
ret
Problem is that a) I am quite sure it ran fine in 2010, and very fast, b) it crashes now.
I have stared at it for quite a while, but the fog doesn't go away. Any ideas?
P.S.: Alex's popcount is 16-bit only, and would need an add eax, ecx to work (which in the source is present but higher up).
Here are some timings (the last one is the native popcnt instruction):
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
653 ms, sum alex=799979533
543 ms, sum mb32=799979533
475 ms, sum mb16=799979533
446 ms, sum popc=799979533
631 ms, sum alex=799979533
549 ms, sum mb32=799979533
478 ms, sum mb16=799979533
482 ms, sum popc=799979533
649 ms, sum alex=799979533
560 ms, sum mb32=799979533
472 ms, sum mb16=799979533
449 ms, sum popc=799979533
Can I have some timings please?
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Counting 1000* the lines in Windows.inc
naive MasmB w/frame lfcnt
1154 75 84 ms 1145 ms
1111 77 84 ms 1143 ms
1122 79 86 ms 1138 ms
1113 78 86 ms 1147 ms
1217 76 84 ms 1142 ms
1245 75 88 ms 1152 ms
1107 75 86 ms 1146 ms
1106 75 84 ms 1145 ms
26902 lines found
Intel(R) Pentium(R) 4 CPU 3.00GHz
Counting 1000* the lines in Windows.inc
naive MasmB w/frame lfcnt
1515 296 290 ms 1937 ms
1528 292 309 ms 1925 ms
1521 299 291 ms 1934 ms
1518 290 298 ms 1926 ms
1518 296 291 ms 1934 ms
1524 287 295 ms 1934 ms
1605 292 291 ms 1933 ms
1531 286 297 ms 1930 ms
26894 lines found
Thanks, Dave, and welcome back :icon14:
Btw your windows.inc seems 8 lines shorter than mine, which is 977412 bytes of 10 Jan 2012.
P.S.: We just elected a new Mayor of Rome (http://www.bbc.com/news/world-europe-36569410):(http://www.lastampa.it/rf/image_lowres/Pub/p4/2016/06/19/Italia/Foto/RitagliWeb/2016-06-19T234531Z_536217261_S1AETKWVCDAA_RTRMADP_3_ITALY-VOTE-MAYORS-kc3-U10808521269488hG-1024x400@LaStampa.it-Home%20Page.JPG)
Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Counting 1000* the lines in Windows.inc
naive MasmB w/frame lfcnt
754 50 54 ms 669 ms
754 50 55 ms 693 ms
772 51 53 ms 675 ms
750 50 54 ms 665 ms
768 50 53 ms 669 ms
776 50 55 ms 667 ms
743 50 53 ms 670 ms
744 50 53 ms 668 ms
26902 lines found
i7 likes algos developed on little brother :bgrin:
QuoteIntel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Counting 1000* the lines in Windows.inc
naive MasmB w/frame lfcnt
1140 96 109 ms 1114 ms
1163 85 88 ms 1053 ms
1209 80 83 ms 1037 ms
1144 77 82 ms 1057 ms
1156 77 82 ms 1059 ms
1161 80 84 ms 1017 ms
1140 78 83 ms 1041 ms
1140 77 83 ms 1015 ms
26902 lines found
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
384 ms, sum alex=799979533
346 ms, sum mb32=799979533
311 ms, sum mb16=799979533
297 ms, sum popc=799979533
395 ms, sum alex=799979533
347 ms, sum mb32=799979533
302 ms, sum mb16=799979533
345 ms, sum popc=799979533
384 ms, sum alex=799979533
348 ms, sum mb32=799979533
318 ms, sum mb16=799979533
295 ms, sum popc=799979533
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
Counting 1000* the lines in Windows.inc
naive MasmB w/frame lfcnt
723 48 52 ms 648 ms
714 50 52 ms 647 ms
743 48 52 ms 648 ms
774 49 52 ms 652 ms
724 48 53 ms 649 ms
734 49 52 ms 640 ms
716 50 53 ms 646 ms
722 49 52 ms 643 ms
22274 lines found
Thanks, LiaoMi :icon14:
Your line count is low - is that the latest Masm32 SDK?
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
Counting 1000* the lines in Windows.inc
naive MasmB w/frame lfcnt
902 77 64 ms 838 ms
888 69 106 ms 774 ms
961 107 80 ms 837 ms
868 54 59 ms 731 ms
824 55 59 ms 774 ms
829 73 59 ms 764 ms
822 54 59 ms 731 ms
821 79 85 ms 840 ms
26897 lines found
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
459 ms, sum alex=799979533
359 ms, sum mb32=799979533
332 ms, sum mb16=799979533
344 ms, sum popc=799979533
461 ms, sum alex=799979533
359 ms, sum mb32=799979533
358 ms, sum mb16=799979533
374 ms, sum popc=799979533
461 ms, sum alex=799979533
369 ms, sum mb32=799979533
351 ms, sum mb16=799979533
344 ms, sum popc=799979533
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
Counting 1000* the lines in Windows.inc
naive MasmB w/frame lfcnt
1148 74 100 ms 1191 ms
1141 74 81 ms 1191 ms
1141 74 81 ms 1204 ms
1141 74 81 ms 1192 ms
1291 74 81 ms 1192 ms
1307 74 81 ms 1192 ms
1141 74 81 ms 1192 ms
1141 74 81 ms 1191 ms
30994 lines found
Quote from: jj2007 on June 21, 2016, 08:43:58 AM
Thanks, LiaoMi :icon14:
Your line count is low - is that the latest Masm32 SDK?
Hallo,
no, I noticed that too, it was this version
WINDOWS.INC for 32 bit MASM (Version 1.4c RELEASE April 2008), now the test for a new version -
WINDOWS.INC for 32 bit MASM (Version 1.6 RELEASE January 2012)Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
Counting 1000* the lines in Windows.inc
naive MasmB w/frame lfcnt
838 60 60 ms 747 ms
837 56 61 ms 748 ms
854 58 60 ms 742 ms
894 57 63 ms 750 ms
848 56 63 ms 752 ms
846 57 60 ms 751 ms
840 57 61 ms 752 ms
834 54 62 ms 749 ms
26902 lines found
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
382 ms, sum alex=799979533
345 ms, sum mb32=799979533
313 ms, sum mb16=799979533
297 ms, sum popc=799979533
390 ms, sum alex=799979533
343 ms, sum mb32=799979533
298 ms, sum mb16=799979533
344 ms, sum popc=799979533
378 ms, sum alex=799979533
344 ms, sum mb32=799979533
317 ms, sum mb16=799979533
290 ms, sum popc=799979533
Best regards
Quote from: LiaoMi on June 21, 2016, 11:08:26 PMnew version - WINDOWS.INC for 32 bit MASM (Version 1.6 RELEASE January 2012)
...
26902 lines found
Yep, that's the current one :t
Question is now what Siekmanski has put into the over 4,000 extra lines ::)
Quote from: jj2007 on June 22, 2016, 02:04:14 AM
Question is now what Siekmanski has put into the over 4,000 extra lines ::)
The Dutch are the tallest people on earth and also have the largest Windows.Inc files. :biggrin: