Author Topic: Check the parity of a DWORD  (Read 2455 times)

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
Re: Check the parity of a DWORD
« Reply #15 on: February 23, 2021, 06:17:36 AM »
btw how old cpu do you really want to test on,I am certain an emulator could emulate 486dx or whatever lowest cpu is that can perform bswap  :tongue: :badgrin:

I was more concerned about popcnt ;-)

Quote
any practical use of get parity bit in some PROC?

Probably not - since when do we need a reason for testing algos in the Lab? :tongue:
I ran some benchmark xchg vs mov earlier,from masm32 sdk,so if you exchange xchg does it become little faster?
testing algos in Lab,is kinda similar to test alternative algo,that solves problem using different "untouched" mnemonics/opcodes
already tested an alternative using shufb,so maybe benchmark shufb vs bswap?
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

jj2007

  • Member
  • *****
  • Posts: 11551
  • Assembler is fun ;-)
    • MasmBasic
Re: Check the parity of a DWORD
« Reply #16 on: February 23, 2021, 07:49:04 AM »
I ran some benchmark xchg vs mov earlier,from masm32 sdk,so if you exchange xchg does it become little faster?
testing algos in Lab,is kinda similar to test alternative algo,that solves problem using different "untouched" mnemonics/opcodes
already tested an alternative using shufb,so maybe benchmark shufb vs bswap?

xchg and bswap do different things, not useful here.

pshufb might be a tick faster, but I'd like to remain on the safe side: this is a recent instruction not supported by older CPUs.

mikeburr

  • Member
  • **
  • Posts: 134
Re: Check the parity of a DWORD
« Reply #17 on: February 23, 2021, 11:42:40 AM »
does BSWAP lock the bus as XCHG does ???
regards mikeb

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
Re: Check the parity of a DWORD
« Reply #18 on: February 23, 2021, 09:23:48 PM »
I ran some benchmark xchg vs mov earlier,from masm32 sdk,so if you exchange xchg does it become little faster?
testing algos in Lab,is kinda similar to test alternative algo,that solves problem using different "untouched" mnemonics/opcodes
already tested an alternative using shufb,so maybe benchmark shufb vs bswap?

xchg and bswap do different things, not useful here.

pshufb might be a tick faster, but I'd like to remain on the safe side: this is a recent instruction not supported by older CPUs.
found the xchg_test.exe timing in masm32 examples,which times xchg vs mov,feel free to ignore this timing: :tongue: :bgrin:
Code: [Select]
727 cycles, (xchg reg,reg)*100
8148 cycles, (xchg reg,mem)*100
7062 cycles, (xchg mem,reg)*100
430 cycles, (exchange reg,reg)*100 using mov
952 cycles, (exchange reg,mem)*100 using mov

Press any key to exit...
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

jj2007

  • Member
  • *****
  • Posts: 11551
  • Assembler is fun ;-)
    • MasmBasic
Re: Check the parity of a DWORD
« Reply #19 on: February 24, 2021, 06:00:48 AM »
does BSWAP lock the bus as XCHG does ???

No.

Fredyla027

  • Regular Member
  • *
  • Posts: 1
Re: Check the parity of a DWORD
« Reply #20 on: September 05, 2021, 10:06:32 PM »
popcnt is faster in old AMD  (or timing have a problem :biggrin:)

Code: [Select]
AMD A6-3500 APU [url=https://www.baloune.com/guide-sante-chiens/] comparateur assurance chat [/url] with Radeon(tm) HD Graphics (SSE3)

786     cycles for 400 * setnp sar
396     cycles for 400 * popcnt
3750    cycles for 400 * PopCount
787     cycles for 400 * setnp bswap

787     cycles for 400 * setnp sar
398     cycles for 400 * popcnt
3749    cycles for 400 * PopCount
784     cycles for 400 * setnp bswap

785     cycles for 400 * setnp sar
395     cycles for 400 * popcnt
3748    cycles for 400 * PopCount
785     cycles for 400 * setnp bswap

786     cycles for 400 * setnp sar
398     cycles for 400 * popcnt
3750    cycles for 400 * PopCount
783     cycles for 400 * setnp bswap
wouldnt it be more standard today with unicode 16 reverse string?