Author Topic: PopCount  (Read 3697 times)

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1209
PopCount
« on: August 16, 2014, 02:31:46 AM »
The attachment contains a test of the POPCNT instruction against the sort of code previously used for the purpose.

The very repeatable results running on my Core-i3 G3220:
Code: [Select]
9 cycles, popcnt*10
36 cycles, popcnt1*10
11 cycles, popcnt2*10

9 cycles, popcnt*10
36 cycles, popcnt1*10
11 cycles, popcnt2*10

9 cycles, popcnt*10
36 cycles, popcnt1*10
11 cycles, popcnt2*10

For timing single instructions, and for recent processors even short sequences of instructions, the one-cycle resolution of the TSC presents a problem. It's difficult to know what constitutes a representative test. 

BTW, I assembled with JWASM.

Well Microsoft, here’s another nice mess you’ve gotten us into.

jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: PopCount
« Reply #1 on: August 16, 2014, 03:37:43 AM »
For comparison, PopCount() on a CPU that doesn't support the native popcnt

Intel(R) Celeron(R) M CPU
76 cycles, popcntMB*10
91 cycles, popcnt1*10
22 cycles, popcnt2*10

Gunther

  • Member
  • *****
  • Posts: 3515
  • Forgive your enemies, but never forget their names
Re: PopCount
« Reply #2 on: August 16, 2014, 08:40:30 AM »
Hi Michael,

here are your timings:
Quote
8 cycles, popcnt*10
40 cycles, popcnt1*10
13 cycles, popcnt2*10

8 cycles, popcnt*10
40 cycles, popcnt1*10
12 cycles, popcnt2*10

8 cycles, popcnt*10
40 cycles, popcnt1*10
12 cycles, popcnt2*10

Press any key to continue ...

Gunther
Get your facts first, and then you can distort them.

Siekmanski

  • Member
  • *****
  • Posts: 1094
Re: PopCount
« Reply #3 on: August 16, 2014, 08:47:53 AM »
 i7-4930K CPU @ 3.40GHz
 win 8.1 x64

Quote

8 cycles, popcnt*10
45 cycles, popcnt1*10
13 cycles, popcnt2*10

9 cycles, popcnt*10
45 cycles, popcnt1*10
13 cycles, popcnt2*10

9 cycles, popcnt*10
45 cycles, popcnt1*10
14 cycles, popcnt2*10

Press any key to continue ...

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4813
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: PopCount
« Reply #4 on: August 16, 2014, 08:48:28 AM »
This is the result on my i7 64 bit box.

Code: [Select]


4 cycles, popcnt*10
58 cycles, popcnt1*10
14 cycles, popcnt2*10

4 cycles, popcnt*10
58 cycles, popcnt1*10
14 cycles, popcnt2*10

4 cycles, popcnt*10
58 cycles, popcnt1*10
14 cycles, popcnt2*10

Press any key to continue ...
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: PopCount
« Reply #5 on: August 16, 2014, 09:23:16 AM »
Not an easy choice:

- "true" popcnt excludes many older CPUs
- popcnt1 is kind of slow
- popcnt2 adds 256k of bloat

Attached the code I used for the test above.

jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: PopCount
« Reply #6 on: June 19, 2016, 09:52:10 AM »
There is a 2010 thread here with code that uses this popcnt implementation by Alex/Antariy:
Code: [Select]
@popcnt:
movzx ecx,al
shr eax,8
imul ecx,08040201h
imul eax,08040201h
shr ecx,3
shr eax,3
and ecx,11111111h
and eax,11111111h
imul ecx,11111111h
imul eax,11111111h
shr ecx,28
shr eax,28
add eax, ecx  ; <<< not in the source
ret

Problem is that a) I am quite sure it ran fine in 2010, and very fast, b) it crashes now.
I have stared at it for quite a while, but the fog doesn't go away. Any ideas?

P.S.: Alex's popcount is 16-bit only, and would need an add eax, ecx to work (which in the source is present but higher up).

Here are some timings (the last one is the native popcnt instruction):
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
653 ms, sum alex=799979533
543 ms, sum mb32=799979533
475 ms, sum mb16=799979533
446 ms, sum popc=799979533

631 ms, sum alex=799979533
549 ms, sum mb32=799979533
478 ms, sum mb16=799979533
482 ms, sum popc=799979533

649 ms, sum alex=799979533
560 ms, sum mb32=799979533
472 ms, sum mb16=799979533
449 ms, sum popc=799979533

jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: PopCount
« Reply #7 on: June 20, 2016, 10:21:29 AM »
Can I have some timings please?

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
1154    75      84 ms   1145 ms
1111    77      84 ms   1143 ms
1122    79      86 ms   1138 ms
1113    78      86 ms   1147 ms
1217    76      84 ms   1142 ms
1245    75      88 ms   1152 ms
1107    75      86 ms   1146 ms
1106    75      84 ms   1145 ms
26902 lines found

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: PopCount
« Reply #8 on: June 20, 2016, 10:39:00 AM »
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
1515    296     290 ms  1937 ms
1528    292     309 ms  1925 ms
1521    299     291 ms  1934 ms
1518    290     298 ms  1926 ms
1518    296     291 ms  1934 ms
1524    287     295 ms  1934 ms
1605    292     291 ms  1933 ms
1531    286     297 ms  1930 ms
26894 lines found

jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: PopCount
« Reply #9 on: June 20, 2016, 11:03:26 AM »
Thanks, Dave, and welcome back :icon14:

Btw your windows.inc seems 8 lines shorter than mine, which is 977412 bytes of 10 Jan 2012.

P.S.: We just elected a new Mayor of Rome:

sinsi

  • Member
  • ****
  • Posts: 996
Re: PopCount
« Reply #10 on: June 20, 2016, 02:50:20 PM »
Code: [Select]
Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
754     50      54 ms   669 ms
754     50      55 ms   693 ms
772     51      53 ms   675 ms
750     50      54 ms   665 ms
768     50      53 ms   669 ms
776     50      55 ms   667 ms
743     50      53 ms   670 ms
744     50      53 ms   668 ms
26902 lines found
I can walk on water but stagger on beer.

jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: PopCount
« Reply #11 on: June 20, 2016, 04:45:17 PM »
i7 likes algos developed on little brother :bgrin:

mabdelouahab

  • Member
  • ***
  • Posts: 336
Re: PopCount
« Reply #12 on: June 20, 2016, 10:24:42 PM »
Quote
Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
1140    96      109 ms  1114 ms
1163    85      88 ms   1053 ms
1209    80      83 ms   1037 ms
1144    77      82 ms   1057 ms
1156    77      82 ms   1059 ms
1161    80      84 ms   1017 ms
1140    78      83 ms   1041 ms
1140    77      83 ms   1015 ms
26902 lines found

LiaoMi

  • Member
  • **
  • Posts: 143
Re: PopCount
« Reply #13 on: June 21, 2016, 08:32:25 AM »
Code: [Select]
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
384 ms, sum alex=799979533
346 ms, sum mb32=799979533
311 ms, sum mb16=799979533
297 ms, sum popc=799979533

395 ms, sum alex=799979533
347 ms, sum mb32=799979533
302 ms, sum mb16=799979533
345 ms, sum popc=799979533

384 ms, sum alex=799979533
348 ms, sum mb32=799979533
318 ms, sum mb16=799979533
295 ms, sum popc=799979533

Code: [Select]
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
Counting 1000* the lines in Windows.inc
naive   MasmB   w/frame lfcnt
723     48      52 ms   648 ms
714     50      52 ms   647 ms
743     48      52 ms   648 ms
774     49      52 ms   652 ms
724     48      53 ms   649 ms
734     49      52 ms   640 ms
716     50      53 ms   646 ms
722     49      52 ms   643 ms
22274 lines found


jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: PopCount
« Reply #14 on: June 21, 2016, 08:43:58 AM »
Thanks, LiaoMi :icon14:

Your line count is low - is that the latest Masm32 SDK?