The MASM Forum

General => The Laboratory => Topic started by: jj2007 on February 21, 2021, 12:08:24 PM

Title: Check the parity of a DWORD
Post by: jj2007 on February 21, 2021, 12:08:24 PM
Can I have some timings, please, especially from old CPUs? Thanks :thup:

Note that the popcnt instruction and the PopCount macro perform a bit more: they count the actual number of bits set, and parity gets checked via the and eax, 1 - see below.

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

854     cycles for 100 * setnp
767     cycles for 100 * popcnt
1279    cycles for 100 * PopCount

851     cycles for 100 * setnp
766     cycles for 100 * popcnt
1276    cycles for 100 * PopCount

855     cycles for 100 * setnp
766     cycles for 100 * popcnt
1283    cycles for 100 * PopCount

858     cycles for 100 * setnp
768     cycles for 100 * popcnt
1285    cycles for 100 * PopCount

45      = eax setnp
45      = eax popcnt
45      = eax PopCount


setnp:
void Rand(-1) ; get a random DWORD, 0 ... -1
xor al, ah
mov edx, eax
bswap eax
xor al, ah
xor eax, edx
setnp al


popcnt:
popcnt eax, Rand(-1)
and al, 1


PopCount (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1029):
void PopCount(Rand(-1))
and eax, 1
Title: Re: Check the parity of a DWORD
Post by: LiaoMi on February 21, 2021, 11:19:00 PM
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

759     cycles for 100 * setnp
758     cycles for 100 * popcnt
1056    cycles for 100 * PopCount

876     cycles for 100 * setnp
915     cycles for 100 * popcnt
1143    cycles for 100 * PopCount

761     cycles for 100 * setnp
712     cycles for 100 * popcnt
1051    cycles for 100 * PopCount

769     cycles for 100 * setnp
713     cycles for 100 * popcnt
1038    cycles for 100 * PopCount

45      = eax setnp
45      = eax popcnt
45      = eax PopCount

--- ok ---
Title: Re: Check the parity of a DWORD
Post by: TimoVJL on February 22, 2021, 12:04:57 AM
AMD Athlon(tm) II X2 220 Processor (SSE3)

1003    cycles for 100 * setnp
831     cycles for 100 * popcnt
1611    cycles for 100 * PopCount

1003    cycles for 100 * setnp
831     cycles for 100 * popcnt
1611    cycles for 100 * PopCount

1018    cycles for 100 * setnp
831     cycles for 100 * popcnt
1611    cycles for 100 * PopCount

1003    cycles for 100 * setnp
831     cycles for 100 * popcnt
1708    cycles for 100 * PopCount

45      = eax setnp
45      = eax popcnt
45      = eax PopCount
Title: Re: Check the parity of a DWORD
Post by: jj2007 on February 22, 2021, 12:52:31 AM
Thanks, LiaoMi and Timo :thumbsup:

I've cooked up a variant (setnp sar, 5% faster than setnp using bswap), and rearranged the testbed so that the (pretty fast) Rand(-1) (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1030) does no longer influence the results. Timings:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

739     cycles for 400 * setnp sar
932     cycles for 400 * popcnt
2357    cycles for 400 * PopCount
780     cycles for 400 * setnp bswap

735     cycles for 400 * setnp sar
934     cycles for 400 * popcnt
2364    cycles for 400 * PopCount
781     cycles for 400 * setnp bswap

736     cycles for 400 * setnp sar
931     cycles for 400 * popcnt
2354    cycles for 400 * PopCount
777     cycles for 400 * setnp bswap

736     cycles for 400 * setnp sar
936     cycles for 400 * popcnt
2340    cycles for 400 * PopCount
777     cycles for 400 * setnp bswap
Title: Re: Check the parity of a DWORD
Post by: FORTRANS on February 22, 2021, 01:57:03 AM
Pentium III dosen't run MasmBasic, as per usual.

= = =
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

1786 cycles for 100 * setnp

Parity32 has encountered a problem and needs to close.
We are sorry for the inconvience.

{Stuff...}
Exception information...
code: 0xc000001d
address: 0x0...0401fa
...

= = =
   I had to monkey about with Windows Defender to
get the following to run.

Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

1074 cycles for 100 * setnp
984 cycles for 100 * popcnt
1501 cycles for 100 * PopCount

949 cycles for 100 * setnp
986 cycles for 100 * popcnt
1430 cycles for 100 * PopCount

1327 cycles for 100 * setnp
937 cycles for 100 * popcnt
1327 cycles for 100 * PopCount

1255 cycles for 100 * setnp
897 cycles for 100 * popcnt
1447 cycles for 100 * PopCount

45 = eax setnp
45 = eax popcnt
45 = eax PopCount

--- ok ---

Title: Re: Check the parity of a DWORD
Post by: hutch-- on February 22, 2021, 02:23:44 AM

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (SSE4)        ; about 5 years old

780     cycles for 100 * setnp
737     cycles for 100 * popcnt
1064    cycles for 100 * PopCount

774     cycles for 100 * setnp
735     cycles for 100 * popcnt
1065    cycles for 100 * PopCount

776     cycles for 100 * setnp
740     cycles for 100 * popcnt
1066    cycles for 100 * PopCount

776     cycles for 100 * setnp
735     cycles for 100 * popcnt
1062    cycles for 100 * PopCount

45      = eax setnp
45      = eax popcnt
45      = eax PopCount

--- ok ---

Version 2

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (SSE4)

651     cycles for 400 * setnp sar
961     cycles for 400 * popcnt
2179    cycles for 400 * PopCount
568     cycles for 400 * setnp bswap

651     cycles for 400 * setnp sar
955     cycles for 400 * popcnt
2177    cycles for 400 * PopCount
572     cycles for 400 * setnp bswap

651     cycles for 400 * setnp sar
950     cycles for 400 * popcnt
2182    cycles for 400 * PopCount
572     cycles for 400 * setnp bswap

656     cycles for 400 * setnp sar
951     cycles for 400 * popcnt
2178    cycles for 400 * PopCount
569     cycles for 400 * setnp bswap

205     = eax setnp sar
205     = eax popcnt
205     = eax PopCount
205     = eax setnp bswap

--- ok ---
Title: Re: Check the parity of a DWORD
Post by: TimoVJL on February 22, 2021, 03:15:21 AM
AMD Athlon(tm) II X2 220 Processor (SSE3)

813     cycles for 400 * setnp sar
407     cycles for 400 * popcnt
3213    cycles for 400 * PopCount
810     cycles for 400 * setnp bswap

808     cycles for 400 * setnp sar
405     cycles for 400 * popcnt
3210    cycles for 400 * PopCount
807     cycles for 400 * setnp bswap

810     cycles for 400 * setnp sar
406     cycles for 400 * popcnt
3209    cycles for 400 * PopCount
927     cycles for 400 * setnp bswap

807     cycles for 400 * setnp sar
405     cycles for 400 * popcnt
3329    cycles for 400 * PopCount
810     cycles for 400 * setnp bswap

205     = eax setnp sar
205     = eax popcnt
205     = eax PopCount
205     = eax setnp bswap
Title: Re: Check the parity of a DWORD
Post by: jj2007 on February 22, 2021, 04:28:06 AM
Timo's Athlon has an ultra-fast popcnt instruction. The other surprise is Hutch' i7, with a 15% faster "bswap" algo - mine is 5% slower :cool:

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (SSE4)

651     cycles for 400 * setnp sar
961     cycles for 400 * popcnt
2179    cycles for 400 * PopCount
568     cycles for 400 * setnp bswap


Thanks to all :thup:
Title: Re: Check the parity of a DWORD
Post by: HSE on February 22, 2021, 05:58:09 AM
popcnt is faster in old AMD  (or timing have a problem :biggrin:)

AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

786     cycles for 400 * setnp sar
396     cycles for 400 * popcnt
3750    cycles for 400 * PopCount
787     cycles for 400 * setnp bswap

787     cycles for 400 * setnp sar
398     cycles for 400 * popcnt
3749    cycles for 400 * PopCount
784     cycles for 400 * setnp bswap

785     cycles for 400 * setnp sar
395     cycles for 400 * popcnt
3748    cycles for 400 * PopCount
785     cycles for 400 * setnp bswap

786     cycles for 400 * setnp sar
398     cycles for 400 * popcnt
3750    cycles for 400 * PopCount
783     cycles for 400 * setnp bswap
Title: Re: Check the parity of a DWORD
Post by: LiaoMi on February 22, 2021, 06:00:39 AM
V.2

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

644     cycles for 400 * setnp sar
937     cycles for 400 * popcnt
2141    cycles for 400 * PopCount
592     cycles for 400 * setnp bswap

645     cycles for 400 * setnp sar
930     cycles for 400 * popcnt
2118    cycles for 400 * PopCount
565     cycles for 400 * setnp bswap

635     cycles for 400 * setnp sar
957     cycles for 400 * popcnt
2208    cycles for 400 * PopCount
871     cycles for 400 * setnp bswap

757     cycles for 400 * setnp sar
1096    cycles for 400 * popcnt
2162    cycles for 400 * PopCount
615     cycles for 400 * setnp bswap

205     = eax setnp sar
205     = eax popcnt
205     = eax PopCount
205     = eax setnp bswap

--- ok ---
Title: Re: Check the parity of a DWORD
Post by: jj2007 on February 22, 2021, 09:51:42 AM
Quote from: HSE on February 22, 2021, 05:58:09 AM
popcnt is faster in old AMD  (or timing have a problem :biggrin:)

The timings are not ultra-stable but it's pretty clear from your and Timo's CPUs that AMD has a much faster popcnt implementation. The i7, in contrast, perform better on the setnp bswap version.
Title: Re: Check the parity of a DWORD
Post by: quarantined on February 22, 2021, 03:25:10 PM
For the first test piece...


AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

783     cycles for 100 * setnp
685     cycles for 100 * popcnt
920     cycles for 100 * PopCount

758     cycles for 100 * setnp
683     cycles for 100 * popcnt
918     cycles for 100 * PopCount

770     cycles for 100 * setnp
686     cycles for 100 * popcnt
915     cycles for 100 * PopCount

755     cycles for 100 * setnp
682     cycles for 100 * popcnt
915     cycles for 100 * PopCount

45      = eax setnp
45      = eax popcnt
45      = eax PopCount

--- ok ---



And the second...


AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

609     cycles for 400 * setnp sar
317     cycles for 400 * popcnt
2406    cycles for 400 * PopCount
511     cycles for 400 * setnp bswap

610     cycles for 400 * setnp sar
360     cycles for 400 * popcnt
2420    cycles for 400 * PopCount
512     cycles for 400 * setnp bswap

605     cycles for 400 * setnp sar
313     cycles for 400 * popcnt
2403    cycles for 400 * PopCount
511     cycles for 400 * setnp bswap

609     cycles for 400 * setnp sar
310     cycles for 400 * popcnt
2401    cycles for 400 * PopCount
512     cycles for 400 * setnp bswap

205     = eax setnp sar
205     = eax popcnt
205     = eax PopCount
205     = eax setnp bswap

--- ok ---


I'm a little late to the party.  :cool:
Title: Re: Check the parity of a DWORD
Post by: TimoVJL on February 22, 2021, 06:19:29 PM
AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

948     cycles for 100 * setnp
918     cycles for 100 * popcnt
1082    cycles for 100 * PopCount

951     cycles for 100 * setnp
905     cycles for 100 * popcnt
1090    cycles for 100 * PopCount

948     cycles for 100 * setnp
914     cycles for 100 * popcnt
1104    cycles for 100 * PopCount

957     cycles for 100 * setnp
917     cycles for 100 * popcnt
1114    cycles for 100 * PopCount

45      = eax setnp
45      = eax popcnt
45      = eax PopCount
AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

694     cycles for 400 * setnp sar
307     cycles for 400 * popcnt
2853    cycles for 400 * PopCount
587     cycles for 400 * setnp bswap

699     cycles for 400 * setnp sar
272     cycles for 400 * popcnt
2861    cycles for 400 * PopCount
584     cycles for 400 * setnp bswap

697     cycles for 400 * setnp sar
292     cycles for 400 * popcnt
2851    cycles for 400 * PopCount
581     cycles for 400 * setnp bswap

700     cycles for 400 * setnp sar
269     cycles for 400 * popcnt
2857    cycles for 400 * PopCount
585     cycles for 400 * setnp bswap

205     = eax setnp sar
205     = eax popcnt
205     = eax PopCount
205     = eax setnp bswap
Title: Re: Check the parity of a DWORD
Post by: daydreamer on February 23, 2021, 04:56:48 AM
my newest
wonder if bswap compared to shufb has about the same performance?or shufb is capable of perform 4 x dwords inside xmm regs?
I knew old AMD's know for faster fpu code,maybe also includes some other opcodes

this has turbo up to 3.1ghz,I am not sure spinup is enough,because I dont hear fans,which is usually sign of run full speed,edit:oops I had it in special energy saving mode made for steady "fixed" clock freqency when running emulator,to not have turbo go up and down,messing steady frame rate
btw how old cpu do you really want to test on,I am certain an emulator could emulate 486dx or whatever lowest cpu is that can perform bswap  :tongue: :badgrin:

Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (SSE4)

849     cycles for 100 * setnp
647     cycles for 100 * popcnt
1157    cycles for 100 * PopCount

823     cycles for 100 * setnp
662     cycles for 100 * popcnt
1148    cycles for 100 * PopCount

829     cycles for 100 * setnp
656     cycles for 100 * popcnt
1124    cycles for 100 * PopCount

822     cycles for 100 * setnp
652     cycles for 100 * popcnt
1137    cycles for 100 * PopCount

45      = eax setnp
45      = eax popcnt
45      = eax PopCount

-
-

second version
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (SSE4)

698     cycles for 400 * setnp sar
1023    cycles for 400 * popcnt
1720    cycles for 400 * PopCount
600     cycles for 400 * setnp bswap

715     cycles for 400 * setnp sar
1024    cycles for 400 * popcnt
1714    cycles for 400 * PopCount
609     cycles for 400 * setnp bswap

706     cycles for 400 * setnp sar
1018    cycles for 400 * popcnt
1730    cycles for 400 * PopCount
594     cycles for 400 * setnp bswap

709     cycles for 400 * setnp sar
1014    cycles for 400 * popcnt
1762    cycles for 400 * PopCount
611     cycles for 400 * setnp bswap

205     = eax setnp sar
205     = eax popcnt
205     = eax PopCount
205     = eax setnp bswap

-

any practical use of get parity bit in some PROC?modify a SEED depending on parity bit?
Title: Re: Check the parity of a DWORD
Post by: jj2007 on February 23, 2021, 05:54:09 AM
Quote from: daydreamer on February 23, 2021, 04:56:48 AMbtw how old cpu do you really want to test on,I am certain an emulator could emulate 486dx or whatever lowest cpu is that can perform bswap  :tongue: :badgrin:

I was more concerned about popcnt ;-)

Quoteany practical use of get parity bit in some PROC?

Probably not - since when do we need a reason for testing algos in the Lab? :tongue:
Title: Re: Check the parity of a DWORD
Post by: daydreamer on February 23, 2021, 06:17:36 AM
Quote from: jj2007 on February 23, 2021, 05:54:09 AM
Quote from: daydreamer on February 23, 2021, 04:56:48 AMbtw how old cpu do you really want to test on,I am certain an emulator could emulate 486dx or whatever lowest cpu is that can perform bswap  :tongue: :badgrin:

I was more concerned about popcnt ;-)

Quoteany practical use of get parity bit in some PROC?

Probably not - since when do we need a reason for testing algos in the Lab? :tongue:
I ran some benchmark xchg vs mov earlier,from masm32 sdk,so if you exchange xchg does it become little faster?
testing algos in Lab,is kinda similar to test alternative algo,that solves problem using different "untouched" mnemonics/opcodes
already tested an alternative using shufb,so maybe benchmark shufb vs bswap?
Title: Re: Check the parity of a DWORD
Post by: jj2007 on February 23, 2021, 07:49:04 AM
Quote from: daydreamer on February 23, 2021, 06:17:36 AMI ran some benchmark xchg vs mov earlier,from masm32 sdk,so if you exchange xchg does it become little faster?
testing algos in Lab,is kinda similar to test alternative algo,that solves problem using different "untouched" mnemonics/opcodes
already tested an alternative using shufb,so maybe benchmark shufb vs bswap?

xchg and bswap do different things, not useful here.

pshufb might be a tick faster, but I'd like to remain on the safe side: this is a recent instruction not supported by older CPUs.
Title: Re: Check the parity of a DWORD
Post by: mikeburr on February 23, 2021, 11:42:40 AM
does BSWAP lock the bus as XCHG does ???
regards mikeb
Title: Re: Check the parity of a DWORD
Post by: daydreamer on February 23, 2021, 09:23:48 PM
Quote from: jj2007 on February 23, 2021, 07:49:04 AM
Quote from: daydreamer on February 23, 2021, 06:17:36 AMI ran some benchmark xchg vs mov earlier,from masm32 sdk,so if you exchange xchg does it become little faster?
testing algos in Lab,is kinda similar to test alternative algo,that solves problem using different "untouched" mnemonics/opcodes
already tested an alternative using shufb,so maybe benchmark shufb vs bswap?

xchg and bswap do different things, not useful here.

pshufb might be a tick faster, but I'd like to remain on the safe side: this is a recent instruction not supported by older CPUs.
found the xchg_test.exe timing in masm32 examples,which times xchg vs mov,feel free to ignore this timing: :tongue: :bgrin:
727 cycles, (xchg reg,reg)*100
8148 cycles, (xchg reg,mem)*100
7062 cycles, (xchg mem,reg)*100
430 cycles, (exchange reg,reg)*100 using mov
952 cycles, (exchange reg,mem)*100 using mov

Press any key to exit...
Title: Re: Check the parity of a DWORD
Post by: jj2007 on February 24, 2021, 06:00:48 AM
Quote from: mikeburr on February 23, 2021, 11:42:40 AM
does BSWAP lock the bus as XCHG does ???

No.
Title: Re: Check the parity of a DWORD
Post by: Fredyla027 on September 05, 2021, 10:06:32 PM
Quote from: HSE on February 22, 2021, 05:58:09 AM
popcnt is faster in old AMD  (or timing have a problem :biggrin:)

AMD A6-3500 APU [url=https://www.baloune.com/guide-sante-chiens/] comparateur assurance chat [/url] with Radeon(tm) HD Graphics (SSE3)

786     cycles for 400 * setnp sar
396     cycles for 400 * popcnt
3750    cycles for 400 * PopCount
787     cycles for 400 * setnp bswap

787     cycles for 400 * setnp sar
398     cycles for 400 * popcnt
3749    cycles for 400 * PopCount
784     cycles for 400 * setnp bswap

785     cycles for 400 * setnp sar
395     cycles for 400 * popcnt
3748    cycles for 400 * PopCount
785     cycles for 400 * setnp bswap

786     cycles for 400 * setnp sar
398     cycles for 400 * popcnt
3750    cycles for 400 * PopCount
783     cycles for 400 * setnp bswap

wouldnt it be more standard today with unicode 16 reverse string?