News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Check the parity of a DWORD

Started by jj2007, February 21, 2021, 12:08:24 PM

Previous topic - Next topic

daydreamer

Quote from: jj2007 on February 23, 2021, 05:54:09 AM
Quote from: daydreamer on February 23, 2021, 04:56:48 AMbtw how old cpu do you really want to test on,I am certain an emulator could emulate 486dx or whatever lowest cpu is that can perform bswap  :tongue: :badgrin:

I was more concerned about popcnt ;-)

Quoteany practical use of get parity bit in some PROC?

Probably not - since when do we need a reason for testing algos in the Lab? :tongue:
I ran some benchmark xchg vs mov earlier,from masm32 sdk,so if you exchange xchg does it become little faster?
testing algos in Lab,is kinda similar to test alternative algo,that solves problem using different "untouched" mnemonics/opcodes
already tested an alternative using shufb,so maybe benchmark shufb vs bswap?
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

Quote from: daydreamer on February 23, 2021, 06:17:36 AMI ran some benchmark xchg vs mov earlier,from masm32 sdk,so if you exchange xchg does it become little faster?
testing algos in Lab,is kinda similar to test alternative algo,that solves problem using different "untouched" mnemonics/opcodes
already tested an alternative using shufb,so maybe benchmark shufb vs bswap?

xchg and bswap do different things, not useful here.

pshufb might be a tick faster, but I'd like to remain on the safe side: this is a recent instruction not supported by older CPUs.

mikeburr

does BSWAP lock the bus as XCHG does ???
regards mikeb

daydreamer

Quote from: jj2007 on February 23, 2021, 07:49:04 AM
Quote from: daydreamer on February 23, 2021, 06:17:36 AMI ran some benchmark xchg vs mov earlier,from masm32 sdk,so if you exchange xchg does it become little faster?
testing algos in Lab,is kinda similar to test alternative algo,that solves problem using different "untouched" mnemonics/opcodes
already tested an alternative using shufb,so maybe benchmark shufb vs bswap?

xchg and bswap do different things, not useful here.

pshufb might be a tick faster, but I'd like to remain on the safe side: this is a recent instruction not supported by older CPUs.
found the xchg_test.exe timing in masm32 examples,which times xchg vs mov,feel free to ignore this timing: :tongue: :bgrin:
727 cycles, (xchg reg,reg)*100
8148 cycles, (xchg reg,mem)*100
7062 cycles, (xchg mem,reg)*100
430 cycles, (exchange reg,reg)*100 using mov
952 cycles, (exchange reg,mem)*100 using mov

Press any key to exit...
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007


Fredyla027

Quote from: HSE on February 22, 2021, 05:58:09 AM
popcnt is faster in old AMD  (or timing have a problem :biggrin:)

AMD A6-3500 APU [url=https://www.baloune.com/guide-sante-chiens/] comparateur assurance chat [/url] with Radeon(tm) HD Graphics (SSE3)

786     cycles for 400 * setnp sar
396     cycles for 400 * popcnt
3750    cycles for 400 * PopCount
787     cycles for 400 * setnp bswap

787     cycles for 400 * setnp sar
398     cycles for 400 * popcnt
3749    cycles for 400 * PopCount
784     cycles for 400 * setnp bswap

785     cycles for 400 * setnp sar
395     cycles for 400 * popcnt
3748    cycles for 400 * PopCount
785     cycles for 400 * setnp bswap

786     cycles for 400 * setnp sar
398     cycles for 400 * popcnt
3750    cycles for 400 * PopCount
783     cycles for 400 * setnp bswap

wouldnt it be more standard today with unicode 16 reverse string?