The MASM Forum

General => The Laboratory => Topic started by: MichaelW on August 16, 2014, 02:31:46 AM

Title: PopCount
Post by: MichaelW on August 16, 2014, 02:31:46 AM
The attachment contains a test of the POPCNT instruction against the sort of code previously used for the purpose.

The very repeatable results running on my Core-i3 G3220:

9 cycles, popcnt*10
36 cycles, popcnt1*10
11 cycles, popcnt2*10

9 cycles, popcnt*10
36 cycles, popcnt1*10
11 cycles, popcnt2*10

9 cycles, popcnt*10
36 cycles, popcnt1*10
11 cycles, popcnt2*10


For timing single instructions, and for recent processors even short sequences of instructions, the one-cycle resolution of the TSC presents a problem. It's difficult to know what constitutes a representative test. 

BTW, I assembled with JWASM.

Title: Re: PopCount
Post by: jj2007 on August 16, 2014, 03:37:43 AM
For comparison, PopCount() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1029) on a CPU that doesn't support the native popcnt

Intel(R) Celeron(R) M CPU
76 cycles, popcntMB*10
91 cycles, popcnt1*10
22 cycles, popcnt2*10
Title: Re: PopCount
Post by: Gunther on August 16, 2014, 08:40:30 AM
Hi Michael,

here are your timings:
Quote
8 cycles, popcnt*10
40 cycles, popcnt1*10
13 cycles, popcnt2*10

8 cycles, popcnt*10
40 cycles, popcnt1*10
12 cycles, popcnt2*10

8 cycles, popcnt*10
40 cycles, popcnt1*10
12 cycles, popcnt2*10

Press any key to continue ...

Gunther
Title: Re: PopCount
Post by: Siekmanski on August 16, 2014, 08:47:53 AM
 i7-4930K CPU @ 3.40GHz
win 8.1 x64

Quote

8 cycles, popcnt*10
45 cycles, popcnt1*10
13 cycles, popcnt2*10

9 cycles, popcnt*10
45 cycles, popcnt1*10
13 cycles, popcnt2*10

9 cycles, popcnt*10
45 cycles, popcnt1*10
14 cycles, popcnt2*10

Press any key to continue ...
Title: Re: PopCount
Post by: hutch-- on August 16, 2014, 08:48:28 AM
This is the result on my i7 64 bit box.




4 cycles, popcnt*10
58 cycles, popcnt1*10
14 cycles, popcnt2*10

4 cycles, popcnt*10
58 cycles, popcnt1*10
14 cycles, popcnt2*10

4 cycles, popcnt*10
58 cycles, popcnt1*10
14 cycles, popcnt2*10

Press any key to continue ...
Title: Re: PopCount
Post by: jj2007 on August 16, 2014, 09:23:16 AM
Not an easy choice:

- "true" popcnt excludes many older CPUs
- popcnt1 is kind of slow
- popcnt2 adds 256k of bloat

Attached the code I used for the test above.
Title: Re: PopCount
Post by: jj2007 on June 19, 2016, 09:52:10 AM
There is a 2010 thread here (http://www.masmforum.com/board/index.php?topic=11061.msg128952#msg128952) with code that uses this popcnt implementation by Alex/Antariy:
@popcnt:
movzx ecx,al
shr eax,8
imul ecx,08040201h
imul eax,08040201h
shr ecx,3
shr eax,3
and ecx,11111111h
and eax,11111111h
imul ecx,11111111h
imul eax,11111111h
shr ecx,28
shr eax,28
add eax, ecx  ; <<< not in the source
ret


Problem is that a) I am quite sure it ran fine in 2010, and very fast, b) it crashes now.
I have stared at it for quite a while, but the fog doesn't go away. Any ideas?

P.S.: Alex's popcount is 16-bit only, and would need an add eax, ecx to work (which in the source is present but higher up).

Here are some timings (the last one is the native popcnt instruction):
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
653 ms, sum alex=799979533
543 ms, sum mb32=799979533
475 ms, sum mb16=799979533
446 ms, sum popc=799979533

631 ms, sum alex=799979533
549 ms, sum mb32=799979533
478 ms, sum mb16=799979533
482 ms, sum popc=799979533

649 ms, sum alex=799979533
560 ms, sum mb32=799979533
472 ms, sum mb16=799979533
449 ms, sum popc=799979533
Title: Re: PopCount
Post by: jj2007 on June 20, 2016, 10:21:29 AM
Can I have some timings please?

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
1154    75      84 ms   1145 ms
1111    77      84 ms   1143 ms
1122    79      86 ms   1138 ms
1113    78      86 ms   1147 ms
1217    76      84 ms   1142 ms
1245    75      88 ms   1152 ms
1107    75      86 ms   1146 ms
1106    75      84 ms   1145 ms
26902 lines found
Title: Re: PopCount
Post by: dedndave on June 20, 2016, 10:39:00 AM
Intel(R) Pentium(R) 4 CPU 3.00GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
1515    296     290 ms  1937 ms
1528    292     309 ms  1925 ms
1521    299     291 ms  1934 ms
1518    290     298 ms  1926 ms
1518    296     291 ms  1934 ms
1524    287     295 ms  1934 ms
1605    292     291 ms  1933 ms
1531    286     297 ms  1930 ms
26894 lines found
Title: Re: PopCount
Post by: jj2007 on June 20, 2016, 11:03:26 AM
Thanks, Dave, and welcome back :icon14:

Btw your windows.inc seems 8 lines shorter than mine, which is 977412 bytes of 10 Jan 2012.

P.S.: We just elected a new Mayor of Rome (http://www.bbc.com/news/world-europe-36569410):(http://www.lastampa.it/rf/image_lowres/Pub/p4/2016/06/19/Italia/Foto/RitagliWeb/2016-06-19T234531Z_536217261_S1AETKWVCDAA_RTRMADP_3_ITALY-VOTE-MAYORS-kc3-U10808521269488hG-1024x400@LaStampa.it-Home%20Page.JPG)
Title: Re: PopCount
Post by: sinsi on June 20, 2016, 02:50:20 PM
Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
754     50      54 ms   669 ms
754     50      55 ms   693 ms
772     51      53 ms   675 ms
750     50      54 ms   665 ms
768     50      53 ms   669 ms
776     50      55 ms   667 ms
743     50      53 ms   670 ms
744     50      53 ms   668 ms
26902 lines found
Title: Re: PopCount
Post by: jj2007 on June 20, 2016, 04:45:17 PM
i7 likes algos developed on little brother :bgrin:
Title: Re: PopCount
Post by: mabdelouahab on June 20, 2016, 10:24:42 PM
QuoteIntel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
1140    96      109 ms  1114 ms
1163    85      88 ms   1053 ms
1209    80      83 ms   1037 ms
1144    77      82 ms   1057 ms
1156    77      82 ms   1059 ms
1161    80      84 ms   1017 ms
1140    78      83 ms   1041 ms
1140    77      83 ms   1015 ms
26902 lines found
Title: Re: PopCount
Post by: LiaoMi on June 21, 2016, 08:32:25 AM
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
384 ms, sum alex=799979533
346 ms, sum mb32=799979533
311 ms, sum mb16=799979533
297 ms, sum popc=799979533

395 ms, sum alex=799979533
347 ms, sum mb32=799979533
302 ms, sum mb16=799979533
345 ms, sum popc=799979533

384 ms, sum alex=799979533
348 ms, sum mb32=799979533
318 ms, sum mb16=799979533
295 ms, sum popc=799979533


Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
723     48      52 ms   648 ms
714     50      52 ms   647 ms
743     48      52 ms   648 ms
774     49      52 ms   652 ms
724     48      53 ms   649 ms
734     49      52 ms   640 ms
716     50      53 ms   646 ms
722     49      52 ms   643 ms
22274 lines found


Title: Re: PopCount
Post by: jj2007 on June 21, 2016, 08:43:58 AM
Thanks, LiaoMi :icon14:

Your line count is low - is that the latest Masm32 SDK?
Title: Re: PopCount
Post by: hutch-- on June 21, 2016, 10:55:06 AM

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
902     77      64 ms   838 ms
888     69      106 ms  774 ms
961     107     80 ms   837 ms
868     54      59 ms   731 ms
824     55      59 ms   774 ms
829     73      59 ms   764 ms
822     54      59 ms   731 ms
821     79      85 ms   840 ms
26897 lines found
Title: Re: PopCount
Post by: Siekmanski on June 21, 2016, 11:36:01 AM
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
459 ms, sum alex=799979533
359 ms, sum mb32=799979533
332 ms, sum mb16=799979533
344 ms, sum popc=799979533

461 ms, sum alex=799979533
359 ms, sum mb32=799979533
358 ms, sum mb16=799979533
374 ms, sum popc=799979533

461 ms, sum alex=799979533
369 ms, sum mb32=799979533
351 ms, sum mb16=799979533
344 ms, sum popc=799979533


Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
1148    74      100 ms  1191 ms
1141    74      81 ms   1191 ms
1141    74      81 ms   1204 ms
1141    74      81 ms   1192 ms
1291    74      81 ms   1192 ms
1307    74      81 ms   1192 ms
1141    74      81 ms   1192 ms
1141    74      81 ms   1191 ms
30994 lines found
Title: Re: PopCount
Post by: LiaoMi on June 21, 2016, 11:08:26 PM
Quote from: jj2007 on June 21, 2016, 08:43:58 AM
Thanks, LiaoMi :icon14:

Your line count is low - is that the latest Masm32 SDK?

Hallo,

no, I noticed that too, it was this version WINDOWS.INC for 32 bit MASM (Version 1.4c RELEASE April 2008), now the test for a new version - WINDOWS.INC for 32 bit MASM (Version 1.6 RELEASE January 2012)

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
838     60      60 ms   747 ms
837     56      61 ms   748 ms
854     58      60 ms   742 ms
894     57      63 ms   750 ms
848     56      63 ms   752 ms
846     57      60 ms   751 ms
840     57      61 ms   752 ms
834     54      62 ms   749 ms
26902 lines found


Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
382 ms, sum alex=799979533
345 ms, sum mb32=799979533
313 ms, sum mb16=799979533
297 ms, sum popc=799979533

390 ms, sum alex=799979533
343 ms, sum mb32=799979533
298 ms, sum mb16=799979533
344 ms, sum popc=799979533

378 ms, sum alex=799979533
344 ms, sum mb32=799979533
317 ms, sum mb16=799979533
290 ms, sum popc=799979533


Best regards
Title: Re: PopCount
Post by: jj2007 on June 22, 2016, 02:04:14 AM
Quote from: LiaoMi on June 21, 2016, 11:08:26 PMnew version - WINDOWS.INC for 32 bit MASM (Version 1.6 RELEASE January 2012)
...
26902 lines found

Yep, that's the current one :t

Question is now what Siekmanski has put into the over 4,000 extra lines ::)
Title: Re: PopCount
Post by: Siekmanski on June 22, 2016, 05:47:44 AM
Quote from: jj2007 on June 22, 2016, 02:04:14 AM

Question is now what Siekmanski has put into the over 4,000 extra lines ::)

The Dutch are the tallest people on earth and also have the largest Windows.Inc files.  :biggrin: