Yes use the AMD K10. I´m resolved the problem, maskmovq/dqu exceeds 4 instructions and lookup table.
.section .data
bts0_: .quad 0x8080808080808080,0x8080808080808080
.section .text
.globl _start
_start:
movdqa %xmm0, %xmm1
pand bts0_, %xmm0
pcmpeqb bts0_, %xmm0
pand %xmm0, %xmm1
Is more eficient, and sorry for the AT&T syntax, i assemble in Unix like system.
Thanks for the replies an the welcome.